Class WekaScoringData

  • All Implemented Interfaces:
    org.pentaho.di.trans.step.StepDataInterface

    public class WekaScoringData
    extends org.pentaho.di.trans.step.BaseStepData
    implements org.pentaho.di.trans.step.StepDataInterface
    Holds temporary data and has routines for loading serialized models.
    Author:
    Mark Hall (mhall{[at]}pentaho{[dot]}org)
    • Nested Class Summary

      • Nested classes/interfaces inherited from class org.pentaho.di.trans.step.BaseStepData

        org.pentaho.di.trans.step.BaseStepData.StepExecutionStatus
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected WekaScoringModel m_defaultModel
      Holds a default model - only used when model files are sourced from a field in the incoming data rows.
      protected WekaScoringModel m_model
      Holds the actual Weka model (classifier, clusterer or PMML) used by this copy of the step
      protected org.pentaho.di.core.row.RowMetaInterface m_outputRowMeta
      the output data format
      protected boolean m_updateIncrementalModel
      whether to update the model (if incremental)
      static int NO_MATCH
      some constants for various input field - attribute match/type problems
      static int TYPE_MISMATCH  
    • Constructor Summary

      Constructors 
      Constructor Description
      WekaScoringData()  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      static int[] findMappings​(weka.core.Instances header, org.pentaho.di.core.row.RowMetaInterface inputRowMeta)
      Finds a mapping between the attributes that a Weka model has been trained with and the incoming Kettle row format.
      Object[] generatePrediction​(org.pentaho.di.core.row.RowMetaInterface inputMeta, org.pentaho.di.core.row.RowMetaInterface outputMeta, Object[] inputRow, WekaScoringMeta meta)
      Generates a prediction (more specifically, an output row containing all input Kettle fields plus new fields that hold the prediction(s)) for an incoming Kettle row given a Weka model.
      Object[][] generatePredictions​(org.pentaho.di.core.row.RowMetaInterface inputMeta, org.pentaho.di.core.row.RowMetaInterface outputMeta, List<Object[]> inputRows, WekaScoringMeta meta)
      Generates a batch of predictions (more specifically, an array of output rows containing all input Kettle fields plus new fields that hold the prediction(s)) for each incoming Kettle row given a Weka model.
      WekaScoringModel getDefaultModel()
      Get the default model for this copy of the step to use.
      WekaScoringModel getModel()
      Get the model that this copy of the step is using
      org.pentaho.di.core.row.RowMetaInterface getOutputRowMeta()
      Get the meta data for the output format
      static WekaScoringModel loadSerializedModel​(String modelFile, org.pentaho.di.core.logging.LogChannelInterface log, org.pentaho.di.core.variables.VariableSpace space)
      Loads a serialized model.
      void mapIncomingRowMetaData​(weka.core.Instances header, org.pentaho.di.core.row.RowMetaInterface inputRowMeta, boolean updateIncrementalModel, org.pentaho.di.core.logging.LogChannelInterface log)
      Finds a mapping between the attributes that a Weka model has been trained with and the incoming Kettle row format.
      static boolean modelFileExists​(String modelFile, org.pentaho.di.core.variables.VariableSpace space)  
      static void saveSerializedModel​(WekaScoringModel wsm, File saveTo)  
      void setDefaultModel​(WekaScoringModel model)
      Set the default model for this copy of the step to use.
      void setModel​(WekaScoringModel model)
      Set the model for this copy of the step to use
      void setOutputRowMeta​(org.pentaho.di.core.row.RowMetaInterface rmi)
      Set the meta data for the output format
      • Methods inherited from class org.pentaho.di.trans.step.BaseStepData

        getStatus, isDisposed, isEmpty, isFinished, isIdle, isInitialising, isRunning, isStopped, setStatus
      • Methods inherited from interface org.pentaho.di.trans.step.StepDataInterface

        getStatus, isDisposed, isEmpty, isFinished, isIdle, isInitialising, isRunning, setStatus
    • Field Detail

      • NO_MATCH

        public static final int NO_MATCH
        some constants for various input field - attribute match/type problems
        See Also:
        Constant Field Values
      • m_outputRowMeta

        protected org.pentaho.di.core.row.RowMetaInterface m_outputRowMeta
        the output data format
      • m_model

        protected WekaScoringModel m_model
        Holds the actual Weka model (classifier, clusterer or PMML) used by this copy of the step
      • m_defaultModel

        protected WekaScoringModel m_defaultModel
        Holds a default model - only used when model files are sourced from a field in the incoming data rows. In this case, it is the fallback model if there is no model file specified in the incoming row.
      • m_updateIncrementalModel

        protected boolean m_updateIncrementalModel
        whether to update the model (if incremental)
    • Constructor Detail

      • WekaScoringData

        public WekaScoringData()
    • Method Detail

      • setModel

        public void setModel​(WekaScoringModel model)
        Set the model for this copy of the step to use
        Parameters:
        model - the model to use
      • getModel

        public WekaScoringModel getModel()
        Get the model that this copy of the step is using
        Returns:
        the model that this copy of the step is using
      • setDefaultModel

        public void setDefaultModel​(WekaScoringModel model)
        Set the default model for this copy of the step to use. This gets used if we are getting model file paths from a field in the incoming row structure and a given row has null for the model path.
        Parameters:
        model - the model to use as fallback
      • getDefaultModel

        public WekaScoringModel getDefaultModel()
        Get the default model for this copy of the step to use. This gets used if we are getting model file paths from a field in the incoming row structure and a given row has null for the model path.
        Returns:
        the model to use as fallback
      • getOutputRowMeta

        public org.pentaho.di.core.row.RowMetaInterface getOutputRowMeta()
        Get the meta data for the output format
        Returns:
        a RowMetaInterface value
      • setOutputRowMeta

        public void setOutputRowMeta​(org.pentaho.di.core.row.RowMetaInterface rmi)
        Set the meta data for the output format
        Parameters:
        rmi - a RowMetaInterface value
      • mapIncomingRowMetaData

        public void mapIncomingRowMetaData​(weka.core.Instances header,
                                           org.pentaho.di.core.row.RowMetaInterface inputRowMeta,
                                           boolean updateIncrementalModel,
                                           org.pentaho.di.core.logging.LogChannelInterface log)
        Finds a mapping between the attributes that a Weka model has been trained with and the incoming Kettle row format. Returns an array of indices, where the element at index 0 of the array is the index of the Kettle field that corresponds to the first attribute in the Instances structure, the element at index 1 is the index of the Kettle fields that corresponds to the second attribute, ...
        Parameters:
        header - the Instances header
        inputRowMeta - the meta data for the incoming rows
        updateIncrementalModel - true if the model is incremental and should be updated on the incoming instances
        log - the log to use
      • modelFileExists

        public static boolean modelFileExists​(String modelFile,
                                              org.pentaho.di.core.variables.VariableSpace space)
                                       throws Exception
        Throws:
        Exception
      • loadSerializedModel

        public static WekaScoringModel loadSerializedModel​(String modelFile,
                                                           org.pentaho.di.core.logging.LogChannelInterface log,
                                                           org.pentaho.di.core.variables.VariableSpace space)
                                                    throws Exception
        Loads a serialized model. Models can either be binary serialized Java objects, objects deep-serialized to xml, or PMML.
        Parameters:
        modelFile - a File value
        Returns:
        the model
        Throws:
        Exception - if there is a problem laoding the model.
      • findMappings

        public static int[] findMappings​(weka.core.Instances header,
                                         org.pentaho.di.core.row.RowMetaInterface inputRowMeta)
        Finds a mapping between the attributes that a Weka model has been trained with and the incoming Kettle row format. Returns an array of indices, where the element at index 0 of the array is the index of the Kettle field that corresponds to the first attribute in the Instances structure, the element at index 1 is the index of the Kettle fields that corresponds to the second attribute, ...
        Parameters:
        header - the Instances header
        inputRowMeta - the meta data for the incoming rows
        Returns:
        the mapping as an array of integer indices
      • generatePredictions

        public Object[][] generatePredictions​(org.pentaho.di.core.row.RowMetaInterface inputMeta,
                                              org.pentaho.di.core.row.RowMetaInterface outputMeta,
                                              List<Object[]> inputRows,
                                              WekaScoringMeta meta)
                                       throws Exception
        Generates a batch of predictions (more specifically, an array of output rows containing all input Kettle fields plus new fields that hold the prediction(s)) for each incoming Kettle row given a Weka model.
        Parameters:
        inputMeta - the meta data for the incoming rows
        outputMeta - the meta data for the output rows
        inputRows - the values of the incoming row
        meta - meta data for this step
        Returns:
        a Kettle row containing all incoming fields along with new ones that hold the prediction(s)
        Throws:
        Exception - if an error occurs
      • generatePrediction

        public Object[] generatePrediction​(org.pentaho.di.core.row.RowMetaInterface inputMeta,
                                           org.pentaho.di.core.row.RowMetaInterface outputMeta,
                                           Object[] inputRow,
                                           WekaScoringMeta meta)
                                    throws Exception
        Generates a prediction (more specifically, an output row containing all input Kettle fields plus new fields that hold the prediction(s)) for an incoming Kettle row given a Weka model.
        Parameters:
        inputMeta - the meta data for the incoming rows
        outputMeta - the meta data for the output rows
        inputRow - the values of the incoming row
        meta - meta data for this step
        Returns:
        a Kettle row containing all incoming fields along with new ones that hold the prediction(s)
        Throws:
        Exception - if an error occurs