Class WekaScoring

java.lang.Object
org.pentaho.di.trans.step.BaseStep
org.pentaho.di.scoring.WekaScoring
All Implemented Interfaces:
org.pentaho.di.core.ExtensionDataInterface, org.pentaho.di.core.logging.HasLogChannelInterface, org.pentaho.di.core.logging.LoggingObjectInterface, org.pentaho.di.core.logging.LoggingObjectLifecycleInterface, org.pentaho.di.core.variables.VariableSpace, org.pentaho.di.trans.step.StepInterface

public class WekaScoring extends org.pentaho.di.trans.step.BaseStep implements org.pentaho.di.trans.step.StepInterface
Applies a pre-built weka model (classifier or clusterer) to incoming rows and appends predictions. Predictions can be a label (classification/clustering), a number (regression), or a probability distribution over classes/clusters.

Attributes that the Weka model was constructed from are automatically mapped to incoming Kettle fields on the basis of name and type. Any attributes that cannot be mapped due to type mismatch or not being present in the incoming fields receive missing values when incoming Kettle rows are converted to Weka's Instance format. Similarly, any values for string fields that have not been seen during the training of the Weka model are converted to missing values.

Author:
Mark Hall (mhall{[at]}pentaho{[dot]}org)
  • Field Summary

    Fields inherited from class org.pentaho.di.trans.step.BaseStep

    deadLockCounter, extensionDataMap, first, linesInput, linesOutput, linesRead, linesRejected, linesSkipped, linesUpdated, linesWritten, log, metaStore, repository, rowListeners, safeStopped, terminator, terminator_rows, variables
  • Constructor Summary

    Constructors
    Constructor
    Description
    WekaScoring(org.pentaho.di.trans.step.StepMeta stepMeta, org.pentaho.di.trans.step.StepDataInterface stepDataInterface, int copyNr, org.pentaho.di.trans.TransMeta transMeta, org.pentaho.di.trans.Trans trans)
    Creates a new WekaScoring instance.
  • Method Summary

    Modifier and Type
    Method
    Description
    boolean
    init(org.pentaho.di.trans.step.StepMetaInterface smi, org.pentaho.di.trans.step.StepDataInterface sdi)
    Initialize the step.
    protected void
     
    boolean
    processRow(org.pentaho.di.trans.step.StepMetaInterface smi, org.pentaho.di.trans.step.StepDataInterface sdi)
    Process an incoming row of data.

    Methods inherited from class org.pentaho.di.trans.step.BaseStep

    addResultFile, addRowListener, addRowSetToInputRowSets, addRowSetToOutputRowSets, addStepListener, batchComplete, beforeStartProcessing, buildLog, canProcessOneRow, checkFeedback, cleanup, clearInputRowSets, clearOutputRowSets, closeQuietly, copyVariablesFrom, decrementLinesRead, decrementLinesWritten, dispatch, dispose, environmentSubstitute, environmentSubstitute, environmentSubstitute, fieldSubstitute, findInputRowSet, findInputRowSet, findOutputRowSet, findOutputRowSet, getBooleanValueOfVariable, getClusterSize, getContainerObjectId, getCopy, getCurrentInputRowSetNr, getCurrentOutputRowSetNr, getDispatcher, getErrorRowMeta, getErrors, getExtensionDataMap, getFilename, getFirstInputRowSet, getInputRowMeta, getInputRowSets, getLinesInput, getLinesOutput, getLinesRead, getLinesRejected, getLinesSkipped, getLinesUpdated, getLinesWritten, getLogChannel, getLogChannelId, getLogFields, getLogLevel, getMetaStore, getNextClassNr, getObjectCopy, getObjectId, getObjectName, getObjectRevision, getObjectType, getOutputRowSets, getParent, getParentVariableSpace, getPartitionID, getPartitionTargets, getPreviewRowMeta, getProcessed, getRegistrationDate, getRemoteInputSteps, getRemoteOutputSteps, getRepartitioning, getRepository, getRepositoryDirectory, getResultFiles, getRow, getRowFrom, getRowHandler, getRowListeners, getRuntime, getServerSockets, getSlaveNr, getSocketRepository, getStatus, getStatusDescription, getStepDataInterface, getStepID, getStepListeners, getStepMeta, getStepMetaInterface, getStepname, getTrans, getTransMeta, getTypeId, getUniqueStepCountAcrossSlaves, getUniqueStepNrAcrossSlaves, getVariable, getVariable, handleGetRowFrom, handlePutRowTo, identifyErrorOutput, incrementLinesInput, incrementLinesOutput, incrementLinesRead, incrementLinesRejected, incrementLinesSkipped, incrementLinesUpdated, incrementLinesWritten, initBeforeStart, initializeVariablesFrom, injectVariables, isBasic, isDebug, isDetailed, isDistributed, isForcingSeparateLogging, isGatheringMetrics, isInitialising, isMapping, isPartitioned, isPaused, isRowLevel, isRunning, isSafeStopped, isStopped, isUsingThreadPriorityManagment, listVariables, logBasic, logBasic, logDebug, logDebug, logDetailed, logDetailed, logError, logError, logError, logMinimal, logMinimal, logRowlevel, logRowlevel, logSummary, markStart, markStop, openRemoteInputStepSocketsOnce, openRemoteOutputStepSocketsOnce, outputIsDone, pauseRunning, putError, putRow, putRowTo, removeRowListener, resumeRunning, rowsetInputSize, rowsetOutputSize, safeModeChecking, safeModeChecking, setCarteObjectId, setCopy, setCurrentInputRowSetNr, setCurrentOutputRowSetNr, setDistributed, setErrorRowMeta, setErrors, setForcingSeparateLogging, setGatheringMetrics, setInputRowMeta, setInputRowSets, setInternalVariables, setLinesInput, setLinesOutput, setLinesRead, setLinesRejected, setLinesSkipped, setLinesUpdated, setLinesWritten, setLogLevel, setMetaStore, setOutputDone, setOutputRowSets, setParentVariableSpace, setPartitioned, setPartitionID, setPartitionTargets, setPaused, setPaused, setPreviewRowMeta, setRepartitioning, setRepository, setRowHandler, setRunning, setSafeStopped, setServerSockets, setSocketRepository, setStepDataInterface, setStepListeners, setStepMeta, setStepMetaInterface, setStepname, setStopped, setTransMeta, setUsingThreadPriorityManagment, setVariable, shareVariablesWith, stopAll, stopRunning, stopRunning, swapFirstInputRowSetIfExists, toString, verifyInputDeadLock, waitUntilTransformationIsStarted

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

    Methods inherited from interface org.pentaho.di.core.logging.LoggingObjectLifecycleInterface

    callAfterLog, callBeforeLog

    Methods inherited from interface org.pentaho.di.trans.step.StepInterface

    addRowListener, addRowSetToInputRowSets, addRowSetToOutputRowSets, addStepListener, afterFinishProcessing, batchComplete, beforeStartProcessing, canProcessOneRow, cleanup, dispose, getCopy, getCurrentInputRowSetNr, getCurrentOutputRowSetNr, getErrors, getInputRowSets, getLinesInput, getLinesOutput, getLinesRead, getLinesRejected, getLinesUpdated, getLinesWritten, getLogChannel, getMetaStore, getOutputRowSets, getPartitionID, getProcessed, getRepository, getResultFiles, getRow, getRowListeners, getRuntime, getStatus, getStepID, getStepMeta, getStepname, getTrans, identifyErrorOutput, initBeforeStart, isMapping, isPartitioned, isPaused, isRunning, isSafeStopped, isStopped, isUsingThreadPriorityManagment, markStart, markStop, pauseRunning, putRow, removeRowListener, resumeRunning, rowsetInputSize, rowsetOutputSize, setCurrentInputRowSetNr, setCurrentOutputRowSetNr, setErrors, setLinesRejected, setMetaStore, setOutputDone, setPartitioned, setPartitionID, setRepartitioning, setRepository, setRunning, setSafeStopped, setStopped, setUsingThreadPriorityManagment, stopAll, stopRunning, subStatuses

    Methods inherited from interface org.pentaho.di.core.variables.VariableSpace

    copyVariablesFrom, environmentSubstitute, environmentSubstitute, environmentSubstitute, fieldSubstitute, getBooleanValueOfVariable, getParentVariableSpace, getVariable, getVariable, initializeVariablesFrom, injectVariables, listVariables, setParentVariableSpace, setVariable, shareVariablesWith
  • Constructor Details

    • WekaScoring

      public WekaScoring(org.pentaho.di.trans.step.StepMeta stepMeta, org.pentaho.di.trans.step.StepDataInterface stepDataInterface, int copyNr, org.pentaho.di.trans.TransMeta transMeta, org.pentaho.di.trans.Trans trans)
      Creates a new WekaScoring instance.
      Parameters:
      stepMeta - holds the step's meta data
      stepDataInterface - holds the step's temporary data
      copyNr - the number assigned to the step
      transMeta - meta data for the transformation
      trans - a Trans value
  • Method Details

    • processRow

      public boolean processRow(org.pentaho.di.trans.step.StepMetaInterface smi, org.pentaho.di.trans.step.StepDataInterface sdi) throws org.pentaho.di.core.exception.KettleException
      Process an incoming row of data.
      Specified by:
      processRow in interface org.pentaho.di.trans.step.StepInterface
      Overrides:
      processRow in class org.pentaho.di.trans.step.BaseStep
      Parameters:
      smi - a StepMetaInterface value
      sdi - a StepDataInterface value
      Returns:
      a boolean value
      Throws:
      org.pentaho.di.core.exception.KettleException - if an error occurs
    • outputBatchRows

      protected void outputBatchRows() throws Exception
      Throws:
      Exception
    • init

      public boolean init(org.pentaho.di.trans.step.StepMetaInterface smi, org.pentaho.di.trans.step.StepDataInterface sdi)
      Initialize the step.
      Specified by:
      init in interface org.pentaho.di.trans.step.StepInterface
      Overrides:
      init in class org.pentaho.di.trans.step.BaseStep
      Parameters:
      smi - a StepMetaInterface value
      sdi - a StepDataInterface value
      Returns:
      a boolean value