Package org.pentaho.di.scoring
Class WekaScoringData
java.lang.Object
org.pentaho.di.trans.step.BaseStepData
org.pentaho.di.scoring.WekaScoringData
- All Implemented Interfaces:
org.pentaho.di.trans.step.StepDataInterface
public class WekaScoringData
extends org.pentaho.di.trans.step.BaseStepData
implements org.pentaho.di.trans.step.StepDataInterface
Holds temporary data and has routines for loading serialized models.
- Author:
- Mark Hall (mhall{[at]}pentaho{[dot]}org)
-
Nested Class Summary
Nested classes/interfaces inherited from class org.pentaho.di.trans.step.BaseStepData
org.pentaho.di.trans.step.BaseStepData.StepExecutionStatus
-
Field Summary
Modifier and TypeFieldDescriptionprotected WekaScoringModel
Holds a default model - only used when model files are sourced from a field in the incoming data rows.protected WekaScoringModel
Holds the actual Weka model (classifier, clusterer or PMML) used by this copy of the stepprotected org.pentaho.di.core.row.RowMetaInterface
the output data formatprotected boolean
whether to update the model (if incremental)static final int
some constants for various input field - attribute match/type problemsstatic final int
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic int[]
findMappings
(weka.core.Instances header, org.pentaho.di.core.row.RowMetaInterface inputRowMeta) Finds a mapping between the attributes that a Weka model has been trained with and the incoming Kettle row format.Object[]
generatePrediction
(org.pentaho.di.core.row.RowMetaInterface inputMeta, org.pentaho.di.core.row.RowMetaInterface outputMeta, Object[] inputRow, WekaScoringMeta meta) Generates a prediction (more specifically, an output row containing all input Kettle fields plus new fields that hold the prediction(s)) for an incoming Kettle row given a Weka model.Object[][]
generatePredictions
(org.pentaho.di.core.row.RowMetaInterface inputMeta, org.pentaho.di.core.row.RowMetaInterface outputMeta, List<Object[]> inputRows, WekaScoringMeta meta) Generates a batch of predictions (more specifically, an array of output rows containing all input Kettle fields plus new fields that hold the prediction(s)) for each incoming Kettle row given a Weka model.Get the default model for this copy of the step to use.getModel()
Get the model that this copy of the step is usingorg.pentaho.di.core.row.RowMetaInterface
Get the meta data for the output formatstatic WekaScoringModel
loadSerializedModel
(String modelFile, org.pentaho.di.core.logging.LogChannelInterface log, org.pentaho.di.core.variables.VariableSpace space) Loads a serialized model.void
mapIncomingRowMetaData
(weka.core.Instances header, org.pentaho.di.core.row.RowMetaInterface inputRowMeta, boolean updateIncrementalModel, org.pentaho.di.core.logging.LogChannelInterface log) Finds a mapping between the attributes that a Weka model has been trained with and the incoming Kettle row format.static boolean
modelFileExists
(String modelFile, org.pentaho.di.core.variables.VariableSpace space) static void
saveSerializedModel
(WekaScoringModel wsm, File saveTo) void
setDefaultModel
(WekaScoringModel model) Set the default model for this copy of the step to use.void
setModel
(WekaScoringModel model) Set the model for this copy of the step to usevoid
setOutputRowMeta
(org.pentaho.di.core.row.RowMetaInterface rmi) Set the meta data for the output formatMethods inherited from class org.pentaho.di.trans.step.BaseStepData
getStatus, isDisposed, isEmpty, isFinished, isIdle, isInitialising, isRunning, isStopped, setStatus
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.pentaho.di.trans.step.StepDataInterface
getStatus, isDisposed, isEmpty, isFinished, isIdle, isInitialising, isRunning, setStatus
-
Field Details
-
NO_MATCH
public static final int NO_MATCHsome constants for various input field - attribute match/type problems- See Also:
-
TYPE_MISMATCH
public static final int TYPE_MISMATCH- See Also:
-
m_outputRowMeta
protected org.pentaho.di.core.row.RowMetaInterface m_outputRowMetathe output data format -
m_model
Holds the actual Weka model (classifier, clusterer or PMML) used by this copy of the step -
m_defaultModel
Holds a default model - only used when model files are sourced from a field in the incoming data rows. In this case, it is the fallback model if there is no model file specified in the incoming row. -
m_updateIncrementalModel
protected boolean m_updateIncrementalModelwhether to update the model (if incremental)
-
-
Constructor Details
-
WekaScoringData
public WekaScoringData()
-
-
Method Details
-
setModel
Set the model for this copy of the step to use- Parameters:
model
- the model to use
-
getModel
Get the model that this copy of the step is using- Returns:
- the model that this copy of the step is using
-
setDefaultModel
Set the default model for this copy of the step to use. This gets used if we are getting model file paths from a field in the incoming row structure and a given row has null for the model path.- Parameters:
model
- the model to use as fallback
-
getDefaultModel
Get the default model for this copy of the step to use. This gets used if we are getting model file paths from a field in the incoming row structure and a given row has null for the model path.- Returns:
- the model to use as fallback
-
getOutputRowMeta
public org.pentaho.di.core.row.RowMetaInterface getOutputRowMeta()Get the meta data for the output format- Returns:
- a
RowMetaInterface
value
-
setOutputRowMeta
public void setOutputRowMeta(org.pentaho.di.core.row.RowMetaInterface rmi) Set the meta data for the output format- Parameters:
rmi
- aRowMetaInterface
value
-
mapIncomingRowMetaData
public void mapIncomingRowMetaData(weka.core.Instances header, org.pentaho.di.core.row.RowMetaInterface inputRowMeta, boolean updateIncrementalModel, org.pentaho.di.core.logging.LogChannelInterface log) Finds a mapping between the attributes that a Weka model has been trained with and the incoming Kettle row format. Returns an array of indices, where the element at index 0 of the array is the index of the Kettle field that corresponds to the first attribute in the Instances structure, the element at index 1 is the index of the Kettle fields that corresponds to the second attribute, ...- Parameters:
header
- the Instances headerinputRowMeta
- the meta data for the incoming rowsupdateIncrementalModel
- true if the model is incremental and should be updated on the incoming instanceslog
- the log to use
-
modelFileExists
public static boolean modelFileExists(String modelFile, org.pentaho.di.core.variables.VariableSpace space) throws Exception - Throws:
Exception
-
loadSerializedModel
public static WekaScoringModel loadSerializedModel(String modelFile, org.pentaho.di.core.logging.LogChannelInterface log, org.pentaho.di.core.variables.VariableSpace space) throws Exception Loads a serialized model. Models can either be binary serialized Java objects, objects deep-serialized to xml, or PMML.- Parameters:
modelFile
- aFile
value- Returns:
- the model
- Throws:
Exception
- if there is a problem laoding the model.
-
saveSerializedModel
- Throws:
Exception
-
findMappings
public static int[] findMappings(weka.core.Instances header, org.pentaho.di.core.row.RowMetaInterface inputRowMeta) Finds a mapping between the attributes that a Weka model has been trained with and the incoming Kettle row format. Returns an array of indices, where the element at index 0 of the array is the index of the Kettle field that corresponds to the first attribute in the Instances structure, the element at index 1 is the index of the Kettle fields that corresponds to the second attribute, ...- Parameters:
header
- the Instances headerinputRowMeta
- the meta data for the incoming rows- Returns:
- the mapping as an array of integer indices
-
generatePredictions
public Object[][] generatePredictions(org.pentaho.di.core.row.RowMetaInterface inputMeta, org.pentaho.di.core.row.RowMetaInterface outputMeta, List<Object[]> inputRows, WekaScoringMeta meta) throws Exception Generates a batch of predictions (more specifically, an array of output rows containing all input Kettle fields plus new fields that hold the prediction(s)) for each incoming Kettle row given a Weka model.- Parameters:
inputMeta
- the meta data for the incoming rowsoutputMeta
- the meta data for the output rowsinputRows
- the values of the incoming rowmeta
- meta data for this step- Returns:
- a Kettle row containing all incoming fields along with new ones that hold the prediction(s)
- Throws:
Exception
- if an error occurs
-
generatePrediction
public Object[] generatePrediction(org.pentaho.di.core.row.RowMetaInterface inputMeta, org.pentaho.di.core.row.RowMetaInterface outputMeta, Object[] inputRow, WekaScoringMeta meta) throws Exception Generates a prediction (more specifically, an output row containing all input Kettle fields plus new fields that hold the prediction(s)) for an incoming Kettle row given a Weka model.- Parameters:
inputMeta
- the meta data for the incoming rowsoutputMeta
- the meta data for the output rowsinputRow
- the values of the incoming rowmeta
- meta data for this step- Returns:
- a Kettle row containing all incoming fields along with new ones that hold the prediction(s)
- Throws:
Exception
- if an error occurs
-