public class WekaForecasting
extends org.pentaho.di.trans.step.BaseStep
implements org.pentaho.di.trans.step.StepInterface
If the forecasting model has been trained with "overlay" data, i.e. fields that are not forecasted or derived automatically, then the incoming data stream needs to contain these fields - not only for the priming data, but *also* for time steps that are to be forecasted. We assume that overlay data for future time steps to be forecasted is indicated by the presence of rows that contain all missing (null) values for the target fields to be forecasted. Of course, the values of the overlay fields need to be non-missing in this data. The number of "overlay" rows determines the number of steps forecasted by the forecaster, and overides any value specified in the "numSteps" parameter.
When a forecast is generated, confidence intervals will be included for any steps for which they were estimated when training the model.
Attributes that the Weka model was constructed from are automatically mapped to incoming Kettle fields on the basis of name and type. Any attributes that cannot be mapped due to type mismatch or not being present in the incoming fields receive missing values when incoming Kettle rows are converted to Weka's Instance format. Similarly, any values for string fields that have not been seen during the training of the Weka model are converted to missing values.
Constructor and Description |
---|
WekaForecasting(org.pentaho.di.trans.step.StepMeta stepMeta,
org.pentaho.di.trans.step.StepDataInterface stepDataInterface,
int copyNr,
org.pentaho.di.trans.TransMeta transMeta,
org.pentaho.di.trans.Trans trans)
Creates a new
WekaForecasting instance. |
Modifier and Type | Method and Description |
---|---|
boolean |
init(org.pentaho.di.trans.step.StepMetaInterface smi,
org.pentaho.di.trans.step.StepDataInterface sdi)
Initialize the step.
|
boolean |
processRow(org.pentaho.di.trans.step.StepMetaInterface smi,
org.pentaho.di.trans.step.StepDataInterface sdi)
Process an incoming row of data.
|
void |
run()
Run is where the action happens!
|
addResultFile, addRowListener, addStepListener, batchComplete, buildLog, canProcessOneRow, cleanup, closeQuietly, copyVariablesFrom, decrementLinesRead, decrementLinesWritten, dispatch, dispose, environmentSubstitute, environmentSubstitute, fieldSubstitute, findInputRowSet, findInputRowSet, findOutputRowSet, findOutputRowSet, getBooleanValueOfVariable, getClusterSize, getContainerObjectId, getCopy, getCurrentInputRowSetNr, getCurrentOutputRowSetNr, getDispatcher, getErrorRowMeta, getErrors, getExtensionDataMap, getFilename, getInputRowMeta, getInputRowSets, getLinesInput, getLinesOutput, getLinesRead, getLinesRejected, getLinesSkipped, getLinesUpdated, getLinesWritten, getLogChannel, getLogChannelId, getLogFields, getLogLevel, getMetaStore, getNextClassNr, getObjectCopy, getObjectId, getObjectName, getObjectRevision, getObjectType, getOutputRowSets, getParent, getParentVariableSpace, getPartitionID, getPartitionTargets, getPreviewRowMeta, getProcessed, getRegistrationDate, getRemoteInputSteps, getRemoteOutputSteps, getRepartitioning, getRepository, getRepositoryDirectory, getResultFiles, getRow, getRowFrom, getRowListeners, getRuntime, getServerSockets, getSlaveNr, getSocketRepository, getStatus, getStatusDescription, getStepDataInterface, getStepID, getStepListeners, getStepMeta, getStepMetaInterface, getStepname, getTrans, getTransMeta, getTypeId, getUniqueStepCountAcrossSlaves, getUniqueStepNrAcrossSlaves, getVariable, getVariable, identifyErrorOutput, incrementLinesInput, incrementLinesOutput, incrementLinesRead, incrementLinesRejected, incrementLinesSkipped, incrementLinesUpdated, incrementLinesWritten, initBeforeStart, initializeVariablesFrom, injectVariables, isBasic, isDebug, isDetailed, isDistributed, isForcingSeparateLogging, isGatheringMetrics, isInitialising, isMapping, isPartitioned, isPaused, isRowLevel, isRunning, isStopped, isUsingThreadPriorityManagment, listVariables, logBasic, logBasic, logDebug, logDebug, logDetailed, logDetailed, logError, logError, logError, logMinimal, logMinimal, logRowlevel, logRowlevel, logSummary, markStart, markStop, outputIsDone, pauseRunning, putError, putRow, putRowTo, removeRowListener, resumeRunning, rowsetInputSize, rowsetOutputSize, safeModeChecking, setCarteObjectId, setCopy, setCurrentInputRowSetNr, setCurrentOutputRowSetNr, setDistributed, setErrorRowMeta, setErrors, setForcingSeparateLogging, setGatheringMetrics, setInputRowMeta, setInputRowSets, setInternalVariables, setLinesInput, setLinesOutput, setLinesRead, setLinesRejected, setLinesSkipped, setLinesUpdated, setLinesWritten, setLogLevel, setMetaStore, setOutputDone, setOutputRowSets, setParentVariableSpace, setPartitioned, setPartitionID, setPartitionTargets, setPaused, setPaused, setPreviewRowMeta, setRepartitioning, setRepository, setRunning, setServerSockets, setSocketRepository, setStepDataInterface, setStepListeners, setStepMeta, setStepMetaInterface, setStepname, setStopped, setTransMeta, setUsingThreadPriorityManagment, setVariable, shareVariablesWith, stopAll, stopRunning, stopRunning, toString
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
addRowListener, addStepListener, batchComplete, canProcessOneRow, cleanup, dispose, getCopy, getCurrentInputRowSetNr, getCurrentOutputRowSetNr, getErrors, getInputRowSets, getLinesInput, getLinesOutput, getLinesRead, getLinesRejected, getLinesUpdated, getLinesWritten, getLogChannel, getMetaStore, getOutputRowSets, getPartitionID, getProcessed, getRepository, getResultFiles, getRow, getRowListeners, getRuntime, getStatus, getStepID, getStepMeta, getStepname, getTrans, identifyErrorOutput, initBeforeStart, isMapping, isPartitioned, isPaused, isRunning, isStopped, isUsingThreadPriorityManagment, markStart, markStop, pauseRunning, putRow, removeRowListener, resumeRunning, rowsetInputSize, rowsetOutputSize, setCurrentInputRowSetNr, setCurrentOutputRowSetNr, setErrors, setLinesRejected, setMetaStore, setOutputDone, setPartitioned, setPartitionID, setRepartitioning, setRepository, setRunning, setStopped, setUsingThreadPriorityManagment, stopAll, stopRunning
copyVariablesFrom, environmentSubstitute, environmentSubstitute, fieldSubstitute, getBooleanValueOfVariable, getParentVariableSpace, getVariable, getVariable, initializeVariablesFrom, injectVariables, listVariables, setParentVariableSpace, setVariable, shareVariablesWith
public WekaForecasting(org.pentaho.di.trans.step.StepMeta stepMeta, org.pentaho.di.trans.step.StepDataInterface stepDataInterface, int copyNr, org.pentaho.di.trans.TransMeta transMeta, org.pentaho.di.trans.Trans trans)
WekaForecasting
instance.stepMeta
- holds the step's meta datastepDataInterface
- holds the step's temporary datacopyNr
- the number assigned to the steptransMeta
- meta data for the transformationtrans
- a Trans
valuepublic boolean processRow(org.pentaho.di.trans.step.StepMetaInterface smi, org.pentaho.di.trans.step.StepDataInterface sdi) throws org.pentaho.di.core.exception.KettleException
processRow
in interface org.pentaho.di.trans.step.StepInterface
processRow
in class org.pentaho.di.trans.step.BaseStep
smi
- a StepMetaInterface
valuesdi
- a StepDataInterface
valueboolean
valueorg.pentaho.di.core.exception.KettleException
- if an error occurspublic boolean init(org.pentaho.di.trans.step.StepMetaInterface smi, org.pentaho.di.trans.step.StepDataInterface sdi)
init
in interface org.pentaho.di.trans.step.StepInterface
init
in class org.pentaho.di.trans.step.BaseStep
smi
- a StepMetaInterface
valuesdi
- a StepDataInterface
valueboolean
valuepublic void run()