Package org.pentaho.di.scoring
Class WekaScoringData
java.lang.Object
org.pentaho.di.trans.step.BaseStepData
org.pentaho.di.scoring.WekaScoringData
- All Implemented Interfaces:
org.pentaho.di.trans.step.StepDataInterface
public class WekaScoringData
extends org.pentaho.di.trans.step.BaseStepData
implements org.pentaho.di.trans.step.StepDataInterface
Holds temporary data and has routines for loading serialized models.
- Author:
- Mark Hall (mhall{[at]}pentaho{[dot]}org)
-
Nested Class Summary
Nested classes/interfaces inherited from class org.pentaho.di.trans.step.BaseStepData
org.pentaho.di.trans.step.BaseStepData.StepExecutionStatus -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected WekaScoringModelHolds a default model - only used when model files are sourced from a field in the incoming data rows.protected WekaScoringModelHolds the actual Weka model (classifier, clusterer or PMML) used by this copy of the stepprotected org.pentaho.di.core.row.RowMetaInterfacethe output data formatprotected booleanwhether to update the model (if incremental)static final intsome constants for various input field - attribute match/type problemsstatic final int -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic int[]findMappings(weka.core.Instances header, org.pentaho.di.core.row.RowMetaInterface inputRowMeta) Finds a mapping between the attributes that a Weka model has been trained with and the incoming Kettle row format.Object[]generatePrediction(org.pentaho.di.core.row.RowMetaInterface inputMeta, org.pentaho.di.core.row.RowMetaInterface outputMeta, Object[] inputRow, WekaScoringMeta meta) Generates a prediction (more specifically, an output row containing all input Kettle fields plus new fields that hold the prediction(s)) for an incoming Kettle row given a Weka model.Object[][]generatePredictions(org.pentaho.di.core.row.RowMetaInterface inputMeta, org.pentaho.di.core.row.RowMetaInterface outputMeta, List<Object[]> inputRows, WekaScoringMeta meta) Generates a batch of predictions (more specifically, an array of output rows containing all input Kettle fields plus new fields that hold the prediction(s)) for each incoming Kettle row given a Weka model.Get the default model for this copy of the step to use.getModel()Get the model that this copy of the step is usingorg.pentaho.di.core.row.RowMetaInterfaceGet the meta data for the output formatstatic WekaScoringModelloadSerializedModel(String modelFile, org.pentaho.di.core.logging.LogChannelInterface log, org.pentaho.di.core.variables.VariableSpace space) Loads a serialized model.voidmapIncomingRowMetaData(weka.core.Instances header, org.pentaho.di.core.row.RowMetaInterface inputRowMeta, boolean updateIncrementalModel, org.pentaho.di.core.logging.LogChannelInterface log) Finds a mapping between the attributes that a Weka model has been trained with and the incoming Kettle row format.static booleanmodelFileExists(String modelFile, org.pentaho.di.core.variables.VariableSpace space) static voidsaveSerializedModel(WekaScoringModel wsm, File saveTo) voidsetDefaultModel(WekaScoringModel model) Set the default model for this copy of the step to use.voidsetModel(WekaScoringModel model) Set the model for this copy of the step to usevoidsetOutputRowMeta(org.pentaho.di.core.row.RowMetaInterface rmi) Set the meta data for the output formatMethods inherited from class org.pentaho.di.trans.step.BaseStepData
getStatus, isDisposed, isEmpty, isFinished, isIdle, isInitialising, isRunning, isStopped, setStatusMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.pentaho.di.trans.step.StepDataInterface
getStatus, isDisposed, isEmpty, isFinished, isIdle, isInitialising, isRunning, setStatus
-
Field Details
-
NO_MATCH
public static final int NO_MATCHsome constants for various input field - attribute match/type problems- See Also:
-
TYPE_MISMATCH
public static final int TYPE_MISMATCH- See Also:
-
m_outputRowMeta
protected org.pentaho.di.core.row.RowMetaInterface m_outputRowMetathe output data format -
m_model
Holds the actual Weka model (classifier, clusterer or PMML) used by this copy of the step -
m_defaultModel
Holds a default model - only used when model files are sourced from a field in the incoming data rows. In this case, it is the fallback model if there is no model file specified in the incoming row. -
m_updateIncrementalModel
protected boolean m_updateIncrementalModelwhether to update the model (if incremental)
-
-
Constructor Details
-
WekaScoringData
public WekaScoringData()
-
-
Method Details
-
setModel
Set the model for this copy of the step to use- Parameters:
model- the model to use
-
getModel
Get the model that this copy of the step is using- Returns:
- the model that this copy of the step is using
-
setDefaultModel
Set the default model for this copy of the step to use. This gets used if we are getting model file paths from a field in the incoming row structure and a given row has null for the model path.- Parameters:
model- the model to use as fallback
-
getDefaultModel
Get the default model for this copy of the step to use. This gets used if we are getting model file paths from a field in the incoming row structure and a given row has null for the model path.- Returns:
- the model to use as fallback
-
getOutputRowMeta
public org.pentaho.di.core.row.RowMetaInterface getOutputRowMeta()Get the meta data for the output format- Returns:
- a
RowMetaInterfacevalue
-
setOutputRowMeta
public void setOutputRowMeta(org.pentaho.di.core.row.RowMetaInterface rmi) Set the meta data for the output format- Parameters:
rmi- aRowMetaInterfacevalue
-
mapIncomingRowMetaData
public void mapIncomingRowMetaData(weka.core.Instances header, org.pentaho.di.core.row.RowMetaInterface inputRowMeta, boolean updateIncrementalModel, org.pentaho.di.core.logging.LogChannelInterface log) Finds a mapping between the attributes that a Weka model has been trained with and the incoming Kettle row format. Returns an array of indices, where the element at index 0 of the array is the index of the Kettle field that corresponds to the first attribute in the Instances structure, the element at index 1 is the index of the Kettle fields that corresponds to the second attribute, ...- Parameters:
header- the Instances headerinputRowMeta- the meta data for the incoming rowsupdateIncrementalModel- true if the model is incremental and should be updated on the incoming instanceslog- the log to use
-
modelFileExists
public static boolean modelFileExists(String modelFile, org.pentaho.di.core.variables.VariableSpace space) throws Exception - Throws:
Exception
-
loadSerializedModel
public static WekaScoringModel loadSerializedModel(String modelFile, org.pentaho.di.core.logging.LogChannelInterface log, org.pentaho.di.core.variables.VariableSpace space) throws Exception Loads a serialized model. Models can either be binary serialized Java objects, objects deep-serialized to xml, or PMML.- Parameters:
modelFile- aFilevalue- Returns:
- the model
- Throws:
Exception- if there is a problem laoding the model.
-
saveSerializedModel
- Throws:
Exception
-
findMappings
public static int[] findMappings(weka.core.Instances header, org.pentaho.di.core.row.RowMetaInterface inputRowMeta) Finds a mapping between the attributes that a Weka model has been trained with and the incoming Kettle row format. Returns an array of indices, where the element at index 0 of the array is the index of the Kettle field that corresponds to the first attribute in the Instances structure, the element at index 1 is the index of the Kettle fields that corresponds to the second attribute, ...- Parameters:
header- the Instances headerinputRowMeta- the meta data for the incoming rows- Returns:
- the mapping as an array of integer indices
-
generatePredictions
public Object[][] generatePredictions(org.pentaho.di.core.row.RowMetaInterface inputMeta, org.pentaho.di.core.row.RowMetaInterface outputMeta, List<Object[]> inputRows, WekaScoringMeta meta) throws Exception Generates a batch of predictions (more specifically, an array of output rows containing all input Kettle fields plus new fields that hold the prediction(s)) for each incoming Kettle row given a Weka model.- Parameters:
inputMeta- the meta data for the incoming rowsoutputMeta- the meta data for the output rowsinputRows- the values of the incoming rowmeta- meta data for this step- Returns:
- a Kettle row containing all incoming fields along with new ones that hold the prediction(s)
- Throws:
Exception- if an error occurs
-
generatePrediction
public Object[] generatePrediction(org.pentaho.di.core.row.RowMetaInterface inputMeta, org.pentaho.di.core.row.RowMetaInterface outputMeta, Object[] inputRow, WekaScoringMeta meta) throws Exception Generates a prediction (more specifically, an output row containing all input Kettle fields plus new fields that hold the prediction(s)) for an incoming Kettle row given a Weka model.- Parameters:
inputMeta- the meta data for the incoming rowsoutputMeta- the meta data for the output rowsinputRow- the values of the incoming rowmeta- meta data for this step- Returns:
- a Kettle row containing all incoming fields along with new ones that hold the prediction(s)
- Throws:
Exception- if an error occurs
-