Package org.pentaho.di.scoring
Class WekaScoringData
- java.lang.Object
-
- org.pentaho.di.trans.step.BaseStepData
-
- org.pentaho.di.scoring.WekaScoringData
-
- All Implemented Interfaces:
org.pentaho.di.trans.step.StepDataInterface
public class WekaScoringData extends org.pentaho.di.trans.step.BaseStepData implements org.pentaho.di.trans.step.StepDataInterface
Holds temporary data and has routines for loading serialized models.- Author:
- Mark Hall (mhall{[at]}pentaho{[dot]}org)
-
-
Field Summary
Fields Modifier and Type Field Description protected WekaScoringModel
m_defaultModel
Holds a default model - only used when model files are sourced from a field in the incoming data rows.protected WekaScoringModel
m_model
Holds the actual Weka model (classifier, clusterer or PMML) used by this copy of the stepprotected org.pentaho.di.core.row.RowMetaInterface
m_outputRowMeta
the output data formatprotected boolean
m_updateIncrementalModel
whether to update the model (if incremental)static int
NO_MATCH
some constants for various input field - attribute match/type problemsstatic int
TYPE_MISMATCH
-
Constructor Summary
Constructors Constructor Description WekaScoringData()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static int[]
findMappings(weka.core.Instances header, org.pentaho.di.core.row.RowMetaInterface inputRowMeta)
Finds a mapping between the attributes that a Weka model has been trained with and the incoming Kettle row format.Object[]
generatePrediction(org.pentaho.di.core.row.RowMetaInterface inputMeta, org.pentaho.di.core.row.RowMetaInterface outputMeta, Object[] inputRow, WekaScoringMeta meta)
Generates a prediction (more specifically, an output row containing all input Kettle fields plus new fields that hold the prediction(s)) for an incoming Kettle row given a Weka model.Object[][]
generatePredictions(org.pentaho.di.core.row.RowMetaInterface inputMeta, org.pentaho.di.core.row.RowMetaInterface outputMeta, List<Object[]> inputRows, WekaScoringMeta meta)
Generates a batch of predictions (more specifically, an array of output rows containing all input Kettle fields plus new fields that hold the prediction(s)) for each incoming Kettle row given a Weka model.WekaScoringModel
getDefaultModel()
Get the default model for this copy of the step to use.WekaScoringModel
getModel()
Get the model that this copy of the step is usingorg.pentaho.di.core.row.RowMetaInterface
getOutputRowMeta()
Get the meta data for the output formatstatic WekaScoringModel
loadSerializedModel(String modelFile, org.pentaho.di.core.logging.LogChannelInterface log, org.pentaho.di.core.variables.VariableSpace space)
Loads a serialized model.void
mapIncomingRowMetaData(weka.core.Instances header, org.pentaho.di.core.row.RowMetaInterface inputRowMeta, boolean updateIncrementalModel, org.pentaho.di.core.logging.LogChannelInterface log)
Finds a mapping between the attributes that a Weka model has been trained with and the incoming Kettle row format.static boolean
modelFileExists(String modelFile, org.pentaho.di.core.variables.VariableSpace space)
static void
saveSerializedModel(WekaScoringModel wsm, File saveTo)
void
setDefaultModel(WekaScoringModel model)
Set the default model for this copy of the step to use.void
setModel(WekaScoringModel model)
Set the model for this copy of the step to usevoid
setOutputRowMeta(org.pentaho.di.core.row.RowMetaInterface rmi)
Set the meta data for the output format-
Methods inherited from class org.pentaho.di.trans.step.BaseStepData
getStatus, isDisposed, isEmpty, isFinished, isIdle, isInitialising, isRunning, isStopped, setStatus
-
-
-
-
Field Detail
-
NO_MATCH
public static final int NO_MATCH
some constants for various input field - attribute match/type problems- See Also:
- Constant Field Values
-
TYPE_MISMATCH
public static final int TYPE_MISMATCH
- See Also:
- Constant Field Values
-
m_outputRowMeta
protected org.pentaho.di.core.row.RowMetaInterface m_outputRowMeta
the output data format
-
m_model
protected WekaScoringModel m_model
Holds the actual Weka model (classifier, clusterer or PMML) used by this copy of the step
-
m_defaultModel
protected WekaScoringModel m_defaultModel
Holds a default model - only used when model files are sourced from a field in the incoming data rows. In this case, it is the fallback model if there is no model file specified in the incoming row.
-
m_updateIncrementalModel
protected boolean m_updateIncrementalModel
whether to update the model (if incremental)
-
-
Method Detail
-
setModel
public void setModel(WekaScoringModel model)
Set the model for this copy of the step to use- Parameters:
model
- the model to use
-
getModel
public WekaScoringModel getModel()
Get the model that this copy of the step is using- Returns:
- the model that this copy of the step is using
-
setDefaultModel
public void setDefaultModel(WekaScoringModel model)
Set the default model for this copy of the step to use. This gets used if we are getting model file paths from a field in the incoming row structure and a given row has null for the model path.- Parameters:
model
- the model to use as fallback
-
getDefaultModel
public WekaScoringModel getDefaultModel()
Get the default model for this copy of the step to use. This gets used if we are getting model file paths from a field in the incoming row structure and a given row has null for the model path.- Returns:
- the model to use as fallback
-
getOutputRowMeta
public org.pentaho.di.core.row.RowMetaInterface getOutputRowMeta()
Get the meta data for the output format- Returns:
- a
RowMetaInterface
value
-
setOutputRowMeta
public void setOutputRowMeta(org.pentaho.di.core.row.RowMetaInterface rmi)
Set the meta data for the output format- Parameters:
rmi
- aRowMetaInterface
value
-
mapIncomingRowMetaData
public void mapIncomingRowMetaData(weka.core.Instances header, org.pentaho.di.core.row.RowMetaInterface inputRowMeta, boolean updateIncrementalModel, org.pentaho.di.core.logging.LogChannelInterface log)
Finds a mapping between the attributes that a Weka model has been trained with and the incoming Kettle row format. Returns an array of indices, where the element at index 0 of the array is the index of the Kettle field that corresponds to the first attribute in the Instances structure, the element at index 1 is the index of the Kettle fields that corresponds to the second attribute, ...- Parameters:
header
- the Instances headerinputRowMeta
- the meta data for the incoming rowsupdateIncrementalModel
- true if the model is incremental and should be updated on the incoming instanceslog
- the log to use
-
modelFileExists
public static boolean modelFileExists(String modelFile, org.pentaho.di.core.variables.VariableSpace space) throws Exception
- Throws:
Exception
-
loadSerializedModel
public static WekaScoringModel loadSerializedModel(String modelFile, org.pentaho.di.core.logging.LogChannelInterface log, org.pentaho.di.core.variables.VariableSpace space) throws Exception
Loads a serialized model. Models can either be binary serialized Java objects, objects deep-serialized to xml, or PMML.- Parameters:
modelFile
- aFile
value- Returns:
- the model
- Throws:
Exception
- if there is a problem laoding the model.
-
saveSerializedModel
public static void saveSerializedModel(WekaScoringModel wsm, File saveTo) throws Exception
- Throws:
Exception
-
findMappings
public static int[] findMappings(weka.core.Instances header, org.pentaho.di.core.row.RowMetaInterface inputRowMeta)
Finds a mapping between the attributes that a Weka model has been trained with and the incoming Kettle row format. Returns an array of indices, where the element at index 0 of the array is the index of the Kettle field that corresponds to the first attribute in the Instances structure, the element at index 1 is the index of the Kettle fields that corresponds to the second attribute, ...- Parameters:
header
- the Instances headerinputRowMeta
- the meta data for the incoming rows- Returns:
- the mapping as an array of integer indices
-
generatePredictions
public Object[][] generatePredictions(org.pentaho.di.core.row.RowMetaInterface inputMeta, org.pentaho.di.core.row.RowMetaInterface outputMeta, List<Object[]> inputRows, WekaScoringMeta meta) throws Exception
Generates a batch of predictions (more specifically, an array of output rows containing all input Kettle fields plus new fields that hold the prediction(s)) for each incoming Kettle row given a Weka model.- Parameters:
inputMeta
- the meta data for the incoming rowsoutputMeta
- the meta data for the output rowsinputRows
- the values of the incoming rowmeta
- meta data for this step- Returns:
- a Kettle row containing all incoming fields along with new ones that hold the prediction(s)
- Throws:
Exception
- if an error occurs
-
generatePrediction
public Object[] generatePrediction(org.pentaho.di.core.row.RowMetaInterface inputMeta, org.pentaho.di.core.row.RowMetaInterface outputMeta, Object[] inputRow, WekaScoringMeta meta) throws Exception
Generates a prediction (more specifically, an output row containing all input Kettle fields plus new fields that hold the prediction(s)) for an incoming Kettle row given a Weka model.- Parameters:
inputMeta
- the meta data for the incoming rowsoutputMeta
- the meta data for the output rowsinputRow
- the values of the incoming rowmeta
- meta data for this step- Returns:
- a Kettle row containing all incoming fields along with new ones that hold the prediction(s)
- Throws:
Exception
- if an error occurs
-
-