Class WekaForecasting

java.lang.Object
org.pentaho.di.trans.step.BaseStep
org.pentaho.di.forecasting.WekaForecasting
All Implemented Interfaces:
org.pentaho.di.core.ExtensionDataInterface, org.pentaho.di.core.logging.HasLogChannelInterface, org.pentaho.di.core.logging.LoggingObjectInterface, org.pentaho.di.core.logging.LoggingObjectLifecycleInterface, org.pentaho.di.core.variables.VariableSpace, org.pentaho.di.trans.step.StepInterface

public class WekaForecasting extends org.pentaho.di.trans.step.BaseStep implements org.pentaho.di.trans.step.StepInterface
Applies a pre-built weka forecasting model to make a forecast for future time steps. Incoming rows are treated as historical data for priming the model with - i.e. the priming process ensures that all the lagged and derived field values for the forecaster are filled with respect to the most recent historical data. The forecaster can then be applied in a closed-loop manner to forecast for future time steps. The closed-loop process takes values forecasted for the next time step and feeds them back into the model in order to make a forecast for subsequent time step, and so on. Therefore, the priming data is expected to up to, and including, the current time step (i.e. up to one step immediately prior to the first future forecast step). IMPORTANT: priming rows are assumed to be equally spaced in time and are sorted in ascending order of time. If a time stamp is included in the data then we check that this is the case. It is also assumed that the priming data (and overlay data if provided) have the same time interval and periodicity as the data used to train the model. If this is not the case then results may be nonsensical. We do not check for this.

If the forecasting model has been trained with "overlay" data, i.e. fields that are not forecasted or derived automatically, then the incoming data stream needs to contain these fields - not only for the priming data, but *also* for time steps that are to be forecasted. We assume that overlay data for future time steps to be forecasted is indicated by the presence of rows that contain all missing (null) values for the target fields to be forecasted. Of course, the values of the overlay fields need to be non-missing in this data. The number of "overlay" rows determines the number of steps forecasted by the forecaster, and overides any value specified in the "numSteps" parameter.

When a forecast is generated, confidence intervals will be included for any steps for which they were estimated when training the model.

Attributes that the Weka model was constructed from are automatically mapped to incoming Kettle fields on the basis of name and type. Any attributes that cannot be mapped due to type mismatch or not being present in the incoming fields receive missing values when incoming Kettle rows are converted to Weka's Instance format. Similarly, any values for string fields that have not been seen during the training of the Weka model are converted to missing values.

Version:
$Revision$
Author:
Mark Hall (mhall{[at]}pentaho{[dot]}com)
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected List<Object[]>
     
    protected List<String>
     
    protected boolean
     
    protected boolean
     
    protected weka.classifiers.timeseries.core.TSLagMaker
     
    protected List<Object[]>
     
    protected boolean
    rebuild the model on the incoming data before forecasting?
    protected String
     
    protected int
     

    Fields inherited from class org.pentaho.di.trans.step.BaseStep

    deadLockCounter, extensionDataMap, first, linesInput, linesOutput, linesRead, linesRejected, linesSkipped, linesUpdated, linesWritten, log, metaStore, repository, rowListeners, safeStopped, terminator, terminator_rows, variables
  • Constructor Summary

    Constructors
    Constructor
    Description
    WekaForecasting(org.pentaho.di.trans.step.StepMeta stepMeta, org.pentaho.di.trans.step.StepDataInterface stepDataInterface, int copyNr, org.pentaho.di.trans.TransMeta transMeta, org.pentaho.di.trans.Trans trans)
    Creates a new WekaForecasting instance.
  • Method Summary

    Modifier and Type
    Method
    Description
    boolean
    init(org.pentaho.di.trans.step.StepMetaInterface smi, org.pentaho.di.trans.step.StepDataInterface sdi)
    Initialize the step.
    boolean
    processRow(org.pentaho.di.trans.step.StepMetaInterface smi, org.pentaho.di.trans.step.StepDataInterface sdi)
    Process an incoming row of data.
    void
    run()
    Run is where the action happens!

    Methods inherited from class org.pentaho.di.trans.step.BaseStep

    addResultFile, addRowListener, addRowSetToInputRowSets, addRowSetToOutputRowSets, addStepListener, batchComplete, beforeStartProcessing, buildLog, canProcessOneRow, checkFeedback, cleanup, clearInputRowSets, clearOutputRowSets, closeQuietly, copyVariablesFrom, decrementLinesRead, decrementLinesWritten, dispatch, dispose, environmentSubstitute, environmentSubstitute, environmentSubstitute, fieldSubstitute, findInputRowSet, findInputRowSet, findOutputRowSet, findOutputRowSet, getBooleanValueOfVariable, getClusterSize, getContainerObjectId, getCopy, getCurrentInputRowSetNr, getCurrentOutputRowSetNr, getDispatcher, getErrorRowMeta, getErrors, getExtensionDataMap, getFilename, getFirstInputRowSet, getInputRowMeta, getInputRowSets, getLinesInput, getLinesOutput, getLinesRead, getLinesRejected, getLinesSkipped, getLinesUpdated, getLinesWritten, getLogChannel, getLogChannelId, getLogFields, getLogLevel, getMetaStore, getNextClassNr, getObjectCopy, getObjectId, getObjectName, getObjectRevision, getObjectType, getOutputRowSets, getParent, getParentVariableSpace, getPartitionID, getPartitionTargets, getPreviewRowMeta, getProcessed, getRegistrationDate, getRemoteInputSteps, getRemoteOutputSteps, getRepartitioning, getRepository, getRepositoryDirectory, getResultFiles, getRow, getRowFrom, getRowHandler, getRowListeners, getRuntime, getServerSockets, getSlaveNr, getSocketRepository, getStatus, getStatusDescription, getStepDataInterface, getStepID, getStepListeners, getStepMeta, getStepMetaInterface, getStepname, getTrans, getTransMeta, getTypeId, getUniqueStepCountAcrossSlaves, getUniqueStepNrAcrossSlaves, getVariable, getVariable, handleGetRowFrom, handlePutRowTo, identifyErrorOutput, incrementLinesInput, incrementLinesOutput, incrementLinesRead, incrementLinesRejected, incrementLinesSkipped, incrementLinesUpdated, incrementLinesWritten, initBeforeStart, initializeVariablesFrom, injectVariables, isBasic, isDebug, isDetailed, isDistributed, isForcingSeparateLogging, isGatheringMetrics, isInitialising, isMapping, isPartitioned, isPaused, isRowLevel, isRunning, isSafeStopped, isStopped, isUsingThreadPriorityManagment, listVariables, logBasic, logBasic, logDebug, logDebug, logDetailed, logDetailed, logError, logError, logError, logMinimal, logMinimal, logRowlevel, logRowlevel, logSummary, markStart, markStop, openRemoteInputStepSocketsOnce, openRemoteOutputStepSocketsOnce, outputIsDone, pauseRunning, putError, putRow, putRowTo, removeRowListener, resumeRunning, rowsetInputSize, rowsetOutputSize, safeModeChecking, safeModeChecking, setCarteObjectId, setCopy, setCurrentInputRowSetNr, setCurrentOutputRowSetNr, setDistributed, setErrorRowMeta, setErrors, setForcingSeparateLogging, setGatheringMetrics, setInputRowMeta, setInputRowSets, setInternalVariables, setLinesInput, setLinesOutput, setLinesRead, setLinesRejected, setLinesSkipped, setLinesUpdated, setLinesWritten, setLogLevel, setMetaStore, setOutputDone, setOutputRowSets, setParentVariableSpace, setPartitioned, setPartitionID, setPartitionTargets, setPaused, setPaused, setPreviewRowMeta, setRepartitioning, setRepository, setRowHandler, setRunning, setSafeStopped, setServerSockets, setSocketRepository, setStepDataInterface, setStepListeners, setStepMeta, setStepMetaInterface, setStepname, setStopped, setTransMeta, setUsingThreadPriorityManagment, setVariable, shareVariablesWith, stopAll, stopRunning, stopRunning, swapFirstInputRowSetIfExists, toString, verifyInputDeadLock, waitUntilTransformationIsStarted

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

    Methods inherited from interface org.pentaho.di.core.logging.LoggingObjectLifecycleInterface

    callAfterLog, callBeforeLog

    Methods inherited from interface org.pentaho.di.trans.step.StepInterface

    addRowListener, addRowSetToInputRowSets, addRowSetToOutputRowSets, addStepListener, afterFinishProcessing, batchComplete, beforeStartProcessing, canProcessOneRow, cleanup, dispose, getCopy, getCurrentInputRowSetNr, getCurrentOutputRowSetNr, getErrors, getInputRowSets, getLinesInput, getLinesOutput, getLinesRead, getLinesRejected, getLinesUpdated, getLinesWritten, getLogChannel, getMetaStore, getOutputRowSets, getPartitionID, getProcessed, getRepository, getResultFiles, getRow, getRowListeners, getRuntime, getStatus, getStepID, getStepMeta, getStepname, getTrans, identifyErrorOutput, initBeforeStart, isMapping, isPartitioned, isPaused, isRunning, isSafeStopped, isStopped, isUsingThreadPriorityManagment, markStart, markStop, pauseRunning, putRow, removeRowListener, resumeRunning, rowsetInputSize, rowsetOutputSize, setCurrentInputRowSetNr, setCurrentOutputRowSetNr, setErrors, setLinesRejected, setMetaStore, setOutputDone, setPartitioned, setPartitionID, setRepartitioning, setRepository, setRunning, setSafeStopped, setStopped, setUsingThreadPriorityManagment, stopAll, stopRunning, subStatuses

    Methods inherited from interface org.pentaho.di.core.variables.VariableSpace

    copyVariablesFrom, environmentSubstitute, environmentSubstitute, environmentSubstitute, fieldSubstitute, getBooleanValueOfVariable, getParentVariableSpace, getVariable, getVariable, initializeVariablesFrom, injectVariables, listVariables, setParentVariableSpace, setVariable, shareVariablesWith
  • Field Details

    • m_overlayData

      protected List<Object[]> m_overlayData
    • m_bufferedPrimeData

      protected List<Object[]> m_bufferedPrimeData
    • m_isIncrementallyPrimeable

      protected boolean m_isIncrementallyPrimeable
    • m_isUsingOverlayData

      protected boolean m_isUsingOverlayData
    • m_rebuildModel

      protected boolean m_rebuildModel
      rebuild the model on the incoming data before forecasting?
    • m_modelLagMaker

      protected weka.classifiers.timeseries.core.TSLagMaker m_modelLagMaker
    • m_timeStampName

      protected String m_timeStampName
    • m_timeStampRowIndex

      protected int m_timeStampRowIndex
    • m_fieldsToForecast

      protected List<String> m_fieldsToForecast
  • Constructor Details

    • WekaForecasting

      public WekaForecasting(org.pentaho.di.trans.step.StepMeta stepMeta, org.pentaho.di.trans.step.StepDataInterface stepDataInterface, int copyNr, org.pentaho.di.trans.TransMeta transMeta, org.pentaho.di.trans.Trans trans)
      Creates a new WekaForecasting instance.
      Parameters:
      stepMeta - holds the step's meta data
      stepDataInterface - holds the step's temporary data
      copyNr - the number assigned to the step
      transMeta - meta data for the transformation
      trans - a Trans value
  • Method Details

    • processRow

      public boolean processRow(org.pentaho.di.trans.step.StepMetaInterface smi, org.pentaho.di.trans.step.StepDataInterface sdi) throws org.pentaho.di.core.exception.KettleException
      Process an incoming row of data.
      Specified by:
      processRow in interface org.pentaho.di.trans.step.StepInterface
      Overrides:
      processRow in class org.pentaho.di.trans.step.BaseStep
      Parameters:
      smi - a StepMetaInterface value
      sdi - a StepDataInterface value
      Returns:
      a boolean value
      Throws:
      org.pentaho.di.core.exception.KettleException - if an error occurs
    • init

      public boolean init(org.pentaho.di.trans.step.StepMetaInterface smi, org.pentaho.di.trans.step.StepDataInterface sdi)
      Initialize the step.
      Specified by:
      init in interface org.pentaho.di.trans.step.StepInterface
      Overrides:
      init in class org.pentaho.di.trans.step.BaseStep
      Parameters:
      smi - a StepMetaInterface value
      sdi - a StepDataInterface value
      Returns:
      a boolean value
    • run

      public void run()
      Run is where the action happens!