Class WekaScoringData

java.lang.Object
org.pentaho.di.trans.step.BaseStepData
org.pentaho.di.scoring.WekaScoringData
All Implemented Interfaces:
org.pentaho.di.trans.step.StepDataInterface

public class WekaScoringData extends org.pentaho.di.trans.step.BaseStepData implements org.pentaho.di.trans.step.StepDataInterface
Holds temporary data and has routines for loading serialized models.
Author:
Mark Hall (mhall{[at]}pentaho{[dot]}org)
  • Nested Class Summary

    Nested classes/interfaces inherited from class org.pentaho.di.trans.step.BaseStepData

    org.pentaho.di.trans.step.BaseStepData.StepExecutionStatus
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    Holds a default model - only used when model files are sourced from a field in the incoming data rows.
    Holds the actual Weka model (classifier, clusterer or PMML) used by this copy of the step
    protected org.pentaho.di.core.row.RowMetaInterface
    the output data format
    protected boolean
    whether to update the model (if incremental)
    static final int
    some constants for various input field - attribute match/type problems
    static final int
     
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    static int[]
    findMappings(weka.core.Instances header, org.pentaho.di.core.row.RowMetaInterface inputRowMeta)
    Finds a mapping between the attributes that a Weka model has been trained with and the incoming Kettle row format.
    generatePrediction(org.pentaho.di.core.row.RowMetaInterface inputMeta, org.pentaho.di.core.row.RowMetaInterface outputMeta, Object[] inputRow, WekaScoringMeta meta)
    Generates a prediction (more specifically, an output row containing all input Kettle fields plus new fields that hold the prediction(s)) for an incoming Kettle row given a Weka model.
    Object[][]
    generatePredictions(org.pentaho.di.core.row.RowMetaInterface inputMeta, org.pentaho.di.core.row.RowMetaInterface outputMeta, List<Object[]> inputRows, WekaScoringMeta meta)
    Generates a batch of predictions (more specifically, an array of output rows containing all input Kettle fields plus new fields that hold the prediction(s)) for each incoming Kettle row given a Weka model.
    Get the default model for this copy of the step to use.
    Get the model that this copy of the step is using
    org.pentaho.di.core.row.RowMetaInterface
    Get the meta data for the output format
    loadSerializedModel(String modelFile, org.pentaho.di.core.logging.LogChannelInterface log, org.pentaho.di.core.variables.VariableSpace space)
    Loads a serialized model.
    void
    mapIncomingRowMetaData(weka.core.Instances header, org.pentaho.di.core.row.RowMetaInterface inputRowMeta, boolean updateIncrementalModel, org.pentaho.di.core.logging.LogChannelInterface log)
    Finds a mapping between the attributes that a Weka model has been trained with and the incoming Kettle row format.
    static boolean
    modelFileExists(String modelFile, org.pentaho.di.core.variables.VariableSpace space)
     
    static void
     
    void
    Set the default model for this copy of the step to use.
    void
    Set the model for this copy of the step to use
    void
    setOutputRowMeta(org.pentaho.di.core.row.RowMetaInterface rmi)
    Set the meta data for the output format

    Methods inherited from class org.pentaho.di.trans.step.BaseStepData

    getStatus, isDisposed, isEmpty, isFinished, isIdle, isInitialising, isRunning, isStopped, setStatus

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface org.pentaho.di.trans.step.StepDataInterface

    getStatus, isDisposed, isEmpty, isFinished, isIdle, isInitialising, isRunning, setStatus
  • Field Details

    • NO_MATCH

      public static final int NO_MATCH
      some constants for various input field - attribute match/type problems
      See Also:
    • TYPE_MISMATCH

      public static final int TYPE_MISMATCH
      See Also:
    • m_outputRowMeta

      protected org.pentaho.di.core.row.RowMetaInterface m_outputRowMeta
      the output data format
    • m_model

      protected WekaScoringModel m_model
      Holds the actual Weka model (classifier, clusterer or PMML) used by this copy of the step
    • m_defaultModel

      protected WekaScoringModel m_defaultModel
      Holds a default model - only used when model files are sourced from a field in the incoming data rows. In this case, it is the fallback model if there is no model file specified in the incoming row.
    • m_updateIncrementalModel

      protected boolean m_updateIncrementalModel
      whether to update the model (if incremental)
  • Constructor Details

    • WekaScoringData

      public WekaScoringData()
  • Method Details

    • setModel

      public void setModel(WekaScoringModel model)
      Set the model for this copy of the step to use
      Parameters:
      model - the model to use
    • getModel

      public WekaScoringModel getModel()
      Get the model that this copy of the step is using
      Returns:
      the model that this copy of the step is using
    • setDefaultModel

      public void setDefaultModel(WekaScoringModel model)
      Set the default model for this copy of the step to use. This gets used if we are getting model file paths from a field in the incoming row structure and a given row has null for the model path.
      Parameters:
      model - the model to use as fallback
    • getDefaultModel

      public WekaScoringModel getDefaultModel()
      Get the default model for this copy of the step to use. This gets used if we are getting model file paths from a field in the incoming row structure and a given row has null for the model path.
      Returns:
      the model to use as fallback
    • getOutputRowMeta

      public org.pentaho.di.core.row.RowMetaInterface getOutputRowMeta()
      Get the meta data for the output format
      Returns:
      a RowMetaInterface value
    • setOutputRowMeta

      public void setOutputRowMeta(org.pentaho.di.core.row.RowMetaInterface rmi)
      Set the meta data for the output format
      Parameters:
      rmi - a RowMetaInterface value
    • mapIncomingRowMetaData

      public void mapIncomingRowMetaData(weka.core.Instances header, org.pentaho.di.core.row.RowMetaInterface inputRowMeta, boolean updateIncrementalModel, org.pentaho.di.core.logging.LogChannelInterface log)
      Finds a mapping between the attributes that a Weka model has been trained with and the incoming Kettle row format. Returns an array of indices, where the element at index 0 of the array is the index of the Kettle field that corresponds to the first attribute in the Instances structure, the element at index 1 is the index of the Kettle fields that corresponds to the second attribute, ...
      Parameters:
      header - the Instances header
      inputRowMeta - the meta data for the incoming rows
      updateIncrementalModel - true if the model is incremental and should be updated on the incoming instances
      log - the log to use
    • modelFileExists

      public static boolean modelFileExists(String modelFile, org.pentaho.di.core.variables.VariableSpace space) throws Exception
      Throws:
      Exception
    • loadSerializedModel

      public static WekaScoringModel loadSerializedModel(String modelFile, org.pentaho.di.core.logging.LogChannelInterface log, org.pentaho.di.core.variables.VariableSpace space) throws Exception
      Loads a serialized model. Models can either be binary serialized Java objects, objects deep-serialized to xml, or PMML.
      Parameters:
      modelFile - a File value
      Returns:
      the model
      Throws:
      Exception - if there is a problem laoding the model.
    • saveSerializedModel

      public static void saveSerializedModel(WekaScoringModel wsm, File saveTo) throws Exception
      Throws:
      Exception
    • findMappings

      public static int[] findMappings(weka.core.Instances header, org.pentaho.di.core.row.RowMetaInterface inputRowMeta)
      Finds a mapping between the attributes that a Weka model has been trained with and the incoming Kettle row format. Returns an array of indices, where the element at index 0 of the array is the index of the Kettle field that corresponds to the first attribute in the Instances structure, the element at index 1 is the index of the Kettle fields that corresponds to the second attribute, ...
      Parameters:
      header - the Instances header
      inputRowMeta - the meta data for the incoming rows
      Returns:
      the mapping as an array of integer indices
    • generatePredictions

      public Object[][] generatePredictions(org.pentaho.di.core.row.RowMetaInterface inputMeta, org.pentaho.di.core.row.RowMetaInterface outputMeta, List<Object[]> inputRows, WekaScoringMeta meta) throws Exception
      Generates a batch of predictions (more specifically, an array of output rows containing all input Kettle fields plus new fields that hold the prediction(s)) for each incoming Kettle row given a Weka model.
      Parameters:
      inputMeta - the meta data for the incoming rows
      outputMeta - the meta data for the output rows
      inputRows - the values of the incoming row
      meta - meta data for this step
      Returns:
      a Kettle row containing all incoming fields along with new ones that hold the prediction(s)
      Throws:
      Exception - if an error occurs
    • generatePrediction

      public Object[] generatePrediction(org.pentaho.di.core.row.RowMetaInterface inputMeta, org.pentaho.di.core.row.RowMetaInterface outputMeta, Object[] inputRow, WekaScoringMeta meta) throws Exception
      Generates a prediction (more specifically, an output row containing all input Kettle fields plus new fields that hold the prediction(s)) for an incoming Kettle row given a Weka model.
      Parameters:
      inputMeta - the meta data for the incoming rows
      outputMeta - the meta data for the output rows
      inputRow - the values of the incoming row
      meta - meta data for this step
      Returns:
      a Kettle row containing all incoming fields along with new ones that hold the prediction(s)
      Throws:
      Exception - if an error occurs