Package org.pentaho.di.scoring
Class WekaScoring
java.lang.Object
org.pentaho.di.trans.step.BaseStep
org.pentaho.di.scoring.WekaScoring
- All Implemented Interfaces:
org.pentaho.di.core.ExtensionDataInterface
,org.pentaho.di.core.logging.HasLogChannelInterface
,org.pentaho.di.core.logging.LoggingObjectInterface
,org.pentaho.di.core.logging.LoggingObjectLifecycleInterface
,org.pentaho.di.core.variables.VariableSpace
,org.pentaho.di.trans.step.StepInterface
public class WekaScoring
extends org.pentaho.di.trans.step.BaseStep
implements org.pentaho.di.trans.step.StepInterface
Applies a pre-built weka model (classifier or clusterer) to incoming rows and
appends predictions. Predictions can be a label (classification/clustering),
a number (regression), or a probability distribution over classes/clusters.
Attributes that the Weka model was constructed from are automatically mapped to incoming Kettle fields on the basis of name and type. Any attributes that cannot be mapped due to type mismatch or not being present in the incoming fields receive missing values when incoming Kettle rows are converted to Weka's Instance format. Similarly, any values for string fields that have not been seen during the training of the Weka model are converted to missing values.
- Author:
- Mark Hall (mhall{[at]}pentaho{[dot]}org)
-
Field Summary
Fields inherited from class org.pentaho.di.trans.step.BaseStep
deadLockCounter, extensionDataMap, first, linesInput, linesOutput, linesRead, linesRejected, linesSkipped, linesUpdated, linesWritten, log, metaStore, repository, rowListeners, safeStopped, terminator, terminator_rows, variables
-
Constructor Summary
ConstructorDescriptionWekaScoring
(org.pentaho.di.trans.step.StepMeta stepMeta, org.pentaho.di.trans.step.StepDataInterface stepDataInterface, int copyNr, org.pentaho.di.trans.TransMeta transMeta, org.pentaho.di.trans.Trans trans) Creates a newWekaScoring
instance. -
Method Summary
Modifier and TypeMethodDescriptionboolean
init
(org.pentaho.di.trans.step.StepMetaInterface smi, org.pentaho.di.trans.step.StepDataInterface sdi) Initialize the step.protected void
boolean
processRow
(org.pentaho.di.trans.step.StepMetaInterface smi, org.pentaho.di.trans.step.StepDataInterface sdi) Process an incoming row of data.Methods inherited from class org.pentaho.di.trans.step.BaseStep
addResultFile, addRowListener, addRowSetToInputRowSets, addRowSetToOutputRowSets, addStepListener, batchComplete, beforeStartProcessing, buildLog, canProcessOneRow, checkFeedback, cleanup, clearInputRowSets, clearOutputRowSets, closeQuietly, copyVariablesFrom, decrementLinesRead, decrementLinesWritten, dispatch, dispose, environmentSubstitute, environmentSubstitute, environmentSubstitute, fieldSubstitute, findInputRowSet, findInputRowSet, findOutputRowSet, findOutputRowSet, getBooleanValueOfVariable, getClusterSize, getContainerObjectId, getCopy, getCurrentInputRowSetNr, getCurrentOutputRowSetNr, getDispatcher, getErrorRowMeta, getErrors, getExtensionDataMap, getFilename, getFirstInputRowSet, getInputRowMeta, getInputRowSets, getLinesInput, getLinesOutput, getLinesRead, getLinesRejected, getLinesSkipped, getLinesUpdated, getLinesWritten, getLogChannel, getLogChannelId, getLogFields, getLogLevel, getMetaStore, getNextClassNr, getObjectCopy, getObjectId, getObjectName, getObjectRevision, getObjectType, getOutputRowSets, getParent, getParentVariableSpace, getPartitionID, getPartitionTargets, getPreviewRowMeta, getProcessed, getRegistrationDate, getRemoteInputSteps, getRemoteOutputSteps, getRepartitioning, getRepository, getRepositoryDirectory, getResultFiles, getRow, getRowFrom, getRowHandler, getRowListeners, getRuntime, getServerSockets, getSlaveNr, getSocketRepository, getStatus, getStatusDescription, getStepDataInterface, getStepID, getStepListeners, getStepMeta, getStepMetaInterface, getStepname, getTrans, getTransMeta, getTypeId, getUniqueStepCountAcrossSlaves, getUniqueStepNrAcrossSlaves, getVariable, getVariable, handleGetRowFrom, handlePutRowTo, identifyErrorOutput, incrementLinesInput, incrementLinesOutput, incrementLinesRead, incrementLinesRejected, incrementLinesSkipped, incrementLinesUpdated, incrementLinesWritten, initBeforeStart, initializeVariablesFrom, injectVariables, isBasic, isDebug, isDetailed, isDistributed, isForcingSeparateLogging, isGatheringMetrics, isInitialising, isMapping, isPartitioned, isPaused, isRowLevel, isRunning, isSafeStopped, isStopped, isUsingThreadPriorityManagment, listVariables, logBasic, logBasic, logDebug, logDebug, logDetailed, logDetailed, logError, logError, logError, logMinimal, logMinimal, logRowlevel, logRowlevel, logSummary, markStart, markStop, openRemoteInputStepSocketsOnce, openRemoteOutputStepSocketsOnce, outputIsDone, pauseRunning, putError, putRow, putRowTo, removeRowListener, resumeRunning, rowsetInputSize, rowsetOutputSize, safeModeChecking, safeModeChecking, setCarteObjectId, setCopy, setCurrentInputRowSetNr, setCurrentOutputRowSetNr, setDistributed, setErrorRowMeta, setErrors, setForcingSeparateLogging, setGatheringMetrics, setInputRowMeta, setInputRowSets, setInternalVariables, setLinesInput, setLinesOutput, setLinesRead, setLinesRejected, setLinesSkipped, setLinesUpdated, setLinesWritten, setLogLevel, setMetaStore, setOutputDone, setOutputRowSets, setParentVariableSpace, setPartitioned, setPartitionID, setPartitionTargets, setPaused, setPaused, setPreviewRowMeta, setRepartitioning, setRepository, setRowHandler, setRunning, setSafeStopped, setServerSockets, setSocketRepository, setStepDataInterface, setStepListeners, setStepMeta, setStepMetaInterface, setStepname, setStopped, setTransMeta, setUsingThreadPriorityManagment, setVariable, shareVariablesWith, stopAll, stopRunning, stopRunning, swapFirstInputRowSetIfExists, toString, verifyInputDeadLock, waitUntilTransformationIsStarted
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface org.pentaho.di.core.logging.LoggingObjectLifecycleInterface
callAfterLog, callBeforeLog
Methods inherited from interface org.pentaho.di.trans.step.StepInterface
addRowListener, addRowSetToInputRowSets, addRowSetToOutputRowSets, addStepListener, afterFinishProcessing, batchComplete, beforeStartProcessing, canProcessOneRow, cleanup, dispose, getCopy, getCurrentInputRowSetNr, getCurrentOutputRowSetNr, getErrors, getInputRowSets, getLinesInput, getLinesOutput, getLinesRead, getLinesRejected, getLinesUpdated, getLinesWritten, getLogChannel, getMetaStore, getOutputRowSets, getPartitionID, getProcessed, getRepository, getResultFiles, getRow, getRowListeners, getRuntime, getStatus, getStepID, getStepMeta, getStepname, getTrans, identifyErrorOutput, initBeforeStart, isMapping, isPartitioned, isPaused, isRunning, isSafeStopped, isStopped, isUsingThreadPriorityManagment, markStart, markStop, pauseRunning, putRow, removeRowListener, resumeRunning, rowsetInputSize, rowsetOutputSize, setCurrentInputRowSetNr, setCurrentOutputRowSetNr, setErrors, setLinesRejected, setMetaStore, setOutputDone, setPartitioned, setPartitionID, setRepartitioning, setRepository, setRunning, setSafeStopped, setStopped, setUsingThreadPriorityManagment, stopAll, stopRunning, subStatuses
Methods inherited from interface org.pentaho.di.core.variables.VariableSpace
copyVariablesFrom, environmentSubstitute, environmentSubstitute, environmentSubstitute, fieldSubstitute, getBooleanValueOfVariable, getParentVariableSpace, getVariable, getVariable, initializeVariablesFrom, injectVariables, listVariables, setParentVariableSpace, setVariable, shareVariablesWith
-
Constructor Details
-
WekaScoring
public WekaScoring(org.pentaho.di.trans.step.StepMeta stepMeta, org.pentaho.di.trans.step.StepDataInterface stepDataInterface, int copyNr, org.pentaho.di.trans.TransMeta transMeta, org.pentaho.di.trans.Trans trans) Creates a newWekaScoring
instance.- Parameters:
stepMeta
- holds the step's meta datastepDataInterface
- holds the step's temporary datacopyNr
- the number assigned to the steptransMeta
- meta data for the transformationtrans
- aTrans
value
-
-
Method Details
-
processRow
public boolean processRow(org.pentaho.di.trans.step.StepMetaInterface smi, org.pentaho.di.trans.step.StepDataInterface sdi) throws org.pentaho.di.core.exception.KettleException Process an incoming row of data.- Specified by:
processRow
in interfaceorg.pentaho.di.trans.step.StepInterface
- Overrides:
processRow
in classorg.pentaho.di.trans.step.BaseStep
- Parameters:
smi
- aStepMetaInterface
valuesdi
- aStepDataInterface
value- Returns:
- a
boolean
value - Throws:
org.pentaho.di.core.exception.KettleException
- if an error occurs
-
outputBatchRows
- Throws:
Exception
-
init
public boolean init(org.pentaho.di.trans.step.StepMetaInterface smi, org.pentaho.di.trans.step.StepDataInterface sdi) Initialize the step.- Specified by:
init
in interfaceorg.pentaho.di.trans.step.StepInterface
- Overrides:
init
in classorg.pentaho.di.trans.step.BaseStep
- Parameters:
smi
- aStepMetaInterface
valuesdi
- aStepDataInterface
value- Returns:
- a
boolean
value
-