Class WekaScoring

  • All Implemented Interfaces:
    org.pentaho.di.core.ExtensionDataInterface, org.pentaho.di.core.logging.HasLogChannelInterface, org.pentaho.di.core.logging.LoggingObjectInterface, org.pentaho.di.core.logging.LoggingObjectLifecycleInterface, org.pentaho.di.core.variables.VariableSpace, org.pentaho.di.trans.step.StepInterface

    public class WekaScoring
    extends org.pentaho.di.trans.step.BaseStep
    implements org.pentaho.di.trans.step.StepInterface
    Applies a pre-built weka model (classifier or clusterer) to incoming rows and appends predictions. Predictions can be a label (classification/clustering), a number (regression), or a probability distribution over classes/clusters.

    Attributes that the Weka model was constructed from are automatically mapped to incoming Kettle fields on the basis of name and type. Any attributes that cannot be mapped due to type mismatch or not being present in the incoming fields receive missing values when incoming Kettle rows are converted to Weka's Instance format. Similarly, any values for string fields that have not been seen during the training of the Weka model are converted to missing values.

    Author:
    Mark Hall (mhall{[at]}pentaho{[dot]}org)
    • Field Summary

      • Fields inherited from class org.pentaho.di.trans.step.BaseStep

        deadLockCounter, extensionDataMap, first, linesInput, linesOutput, linesRead, linesRejected, linesSkipped, linesUpdated, linesWritten, log, metaStore, repository, rowListeners, safeStopped, terminator, terminator_rows, variables
    • Constructor Summary

      Constructors 
      Constructor Description
      WekaScoring​(org.pentaho.di.trans.step.StepMeta stepMeta, org.pentaho.di.trans.step.StepDataInterface stepDataInterface, int copyNr, org.pentaho.di.trans.TransMeta transMeta, org.pentaho.di.trans.Trans trans)
      Creates a new WekaScoring instance.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      boolean init​(org.pentaho.di.trans.step.StepMetaInterface smi, org.pentaho.di.trans.step.StepDataInterface sdi)
      Initialize the step.
      protected void outputBatchRows()  
      boolean processRow​(org.pentaho.di.trans.step.StepMetaInterface smi, org.pentaho.di.trans.step.StepDataInterface sdi)
      Process an incoming row of data.
      • Methods inherited from class org.pentaho.di.trans.step.BaseStep

        addResultFile, addRowListener, addRowSetToInputRowSets, addRowSetToOutputRowSets, addStepListener, batchComplete, buildLog, canProcessOneRow, checkFeedback, cleanup, clearInputRowSets, clearOutputRowSets, closeQuietly, copyVariablesFrom, decrementLinesRead, decrementLinesWritten, dispatch, dispose, environmentSubstitute, environmentSubstitute, environmentSubstitute, fieldSubstitute, findInputRowSet, findInputRowSet, findOutputRowSet, findOutputRowSet, getBooleanValueOfVariable, getClusterSize, getContainerObjectId, getCopy, getCurrentInputRowSetNr, getCurrentOutputRowSetNr, getDispatcher, getErrorRowMeta, getErrors, getExtensionDataMap, getFilename, getFirstInputRowSet, getInputRowMeta, getInputRowSets, getLinesInput, getLinesOutput, getLinesRead, getLinesRejected, getLinesSkipped, getLinesUpdated, getLinesWritten, getLogChannel, getLogChannelId, getLogFields, getLogLevel, getMetaStore, getNextClassNr, getObjectCopy, getObjectId, getObjectName, getObjectRevision, getObjectType, getOutputRowSets, getParent, getParentVariableSpace, getPartitionID, getPartitionTargets, getPreviewRowMeta, getProcessed, getRegistrationDate, getRemoteInputSteps, getRemoteOutputSteps, getRepartitioning, getRepository, getRepositoryDirectory, getResultFiles, getRow, getRowFrom, getRowHandler, getRowListeners, getRuntime, getServerSockets, getSlaveNr, getSocketRepository, getStatus, getStatusDescription, getStepDataInterface, getStepID, getStepListeners, getStepMeta, getStepMetaInterface, getStepname, getTrans, getTransMeta, getTypeId, getUniqueStepCountAcrossSlaves, getUniqueStepNrAcrossSlaves, getVariable, getVariable, handleGetRowFrom, handlePutRowTo, identifyErrorOutput, incrementLinesInput, incrementLinesOutput, incrementLinesRead, incrementLinesRejected, incrementLinesSkipped, incrementLinesUpdated, incrementLinesWritten, initBeforeStart, initializeVariablesFrom, injectVariables, isBasic, isDebug, isDetailed, isDistributed, isForcingSeparateLogging, isGatheringMetrics, isInitialising, isMapping, isPartitioned, isPaused, isRowLevel, isRunning, isSafeStopped, isStopped, isUsingThreadPriorityManagment, listVariables, logBasic, logBasic, logDebug, logDebug, logDetailed, logDetailed, logError, logError, logError, logMinimal, logMinimal, logRowlevel, logRowlevel, logSummary, markStart, markStop, openRemoteInputStepSocketsOnce, openRemoteOutputStepSocketsOnce, outputIsDone, pauseRunning, putError, putRow, putRowTo, removeRowListener, resumeRunning, rowsetInputSize, rowsetOutputSize, safeModeChecking, safeModeChecking, setCarteObjectId, setCopy, setCurrentInputRowSetNr, setCurrentOutputRowSetNr, setDistributed, setErrorRowMeta, setErrors, setForcingSeparateLogging, setGatheringMetrics, setInputRowMeta, setInputRowSets, setInternalVariables, setLinesInput, setLinesOutput, setLinesRead, setLinesRejected, setLinesSkipped, setLinesUpdated, setLinesWritten, setLogLevel, setMetaStore, setOutputDone, setOutputRowSets, setParentVariableSpace, setPartitioned, setPartitionID, setPartitionTargets, setPaused, setPaused, setPreviewRowMeta, setRepartitioning, setRepository, setRowHandler, setRunning, setSafeStopped, setServerSockets, setSocketRepository, setStepDataInterface, setStepListeners, setStepMeta, setStepMetaInterface, setStepname, setStopped, setTransMeta, setUsingThreadPriorityManagment, setVariable, shareVariablesWith, stopAll, stopRunning, stopRunning, swapFirstInputRowSetIfExists, toString, verifyInputDeadLock, waitUntilTransformationIsStarted
      • Methods inherited from interface org.pentaho.di.core.logging.LoggingObjectLifecycleInterface

        callAfterLog, callBeforeLog
      • Methods inherited from interface org.pentaho.di.trans.step.StepInterface

        addRowListener, addRowSetToInputRowSets, addRowSetToOutputRowSets, addStepListener, batchComplete, canProcessOneRow, cleanup, dispose, getCopy, getCurrentInputRowSetNr, getCurrentOutputRowSetNr, getErrors, getInputRowSets, getLinesInput, getLinesOutput, getLinesRead, getLinesRejected, getLinesUpdated, getLinesWritten, getLogChannel, getMetaStore, getOutputRowSets, getPartitionID, getProcessed, getRepository, getResultFiles, getRow, getRowListeners, getRuntime, getStatus, getStepID, getStepMeta, getStepname, getTrans, identifyErrorOutput, initBeforeStart, isMapping, isPartitioned, isPaused, isRunning, isSafeStopped, isStopped, isUsingThreadPriorityManagment, markStart, markStop, pauseRunning, putRow, removeRowListener, resumeRunning, rowsetInputSize, rowsetOutputSize, setCurrentInputRowSetNr, setCurrentOutputRowSetNr, setErrors, setLinesRejected, setMetaStore, setOutputDone, setPartitioned, setPartitionID, setRepartitioning, setRepository, setRunning, setSafeStopped, setStopped, setUsingThreadPriorityManagment, stopAll, stopRunning, subStatuses
      • Methods inherited from interface org.pentaho.di.core.variables.VariableSpace

        copyVariablesFrom, environmentSubstitute, environmentSubstitute, environmentSubstitute, fieldSubstitute, getBooleanValueOfVariable, getParentVariableSpace, getVariable, getVariable, initializeVariablesFrom, injectVariables, listVariables, setParentVariableSpace, setVariable, shareVariablesWith
    • Constructor Detail

      • WekaScoring

        public WekaScoring​(org.pentaho.di.trans.step.StepMeta stepMeta,
                           org.pentaho.di.trans.step.StepDataInterface stepDataInterface,
                           int copyNr,
                           org.pentaho.di.trans.TransMeta transMeta,
                           org.pentaho.di.trans.Trans trans)
        Creates a new WekaScoring instance.
        Parameters:
        stepMeta - holds the step's meta data
        stepDataInterface - holds the step's temporary data
        copyNr - the number assigned to the step
        transMeta - meta data for the transformation
        trans - a Trans value
    • Method Detail

      • processRow

        public boolean processRow​(org.pentaho.di.trans.step.StepMetaInterface smi,
                                  org.pentaho.di.trans.step.StepDataInterface sdi)
                           throws org.pentaho.di.core.exception.KettleException
        Process an incoming row of data.
        Specified by:
        processRow in interface org.pentaho.di.trans.step.StepInterface
        Overrides:
        processRow in class org.pentaho.di.trans.step.BaseStep
        Parameters:
        smi - a StepMetaInterface value
        sdi - a StepDataInterface value
        Returns:
        a boolean value
        Throws:
        org.pentaho.di.core.exception.KettleException - if an error occurs
      • init

        public boolean init​(org.pentaho.di.trans.step.StepMetaInterface smi,
                            org.pentaho.di.trans.step.StepDataInterface sdi)
        Initialize the step.
        Specified by:
        init in interface org.pentaho.di.trans.step.StepInterface
        Overrides:
        init in class org.pentaho.di.trans.step.BaseStep
        Parameters:
        smi - a StepMetaInterface value
        sdi - a StepDataInterface value
        Returns:
        a boolean value