org.pentaho.di.trans.steps.reservoirsampling
Class ReservoirSampling

java.lang.Object
  extended by org.pentaho.di.trans.step.BaseStep
      extended by org.pentaho.di.trans.steps.reservoirsampling.ReservoirSampling
All Implemented Interfaces:
HasLogChannelInterface, LoggingObjectInterface, VariableSpace, StepInterface

public class ReservoirSampling
extends BaseStep
implements StepInterface


Field Summary
 
Fields inherited from class org.pentaho.di.trans.step.BaseStep
first, linesInput, linesOutput, linesRead, linesRejected, linesSkipped, linesUpdated, linesWritten, terminator, terminator_rows
 
Constructor Summary
ReservoirSampling(StepMeta stepMeta, StepDataInterface stepDataInterface, int copyNr, TransMeta transMeta, Trans trans)
          Creates a new ReservoirSampling instance.
 
Method Summary
 boolean init(StepMetaInterface smi, StepDataInterface sdi)
          Initialize the step.
 boolean processRow(StepMetaInterface smi, StepDataInterface sdi)
          Process an incoming row of data.
 void run()
          Run is where the action happens!
 
Methods inherited from class org.pentaho.di.trans.step.BaseStep
addResultFile, addRowListener, addStepListener, batchComplete, buildLog, canProcessOneRow, cleanup, closeQuietly, copyVariablesFrom, decrementLinesRead, decrementLinesWritten, dispatch, dispose, environmentSubstitute, environmentSubstitute, findInputRowSet, findInputRowSet, findOutputRowSet, findOutputRowSet, getBooleanValueOfVariable, getClusterSize, getContainerObjectId, getCopy, getDispatcher, getErrorRowMeta, getErrors, getFilename, getInputRowMeta, getInputRowSets, getLinesInput, getLinesOutput, getLinesRead, getLinesRejected, getLinesSkipped, getLinesUpdated, getLinesWritten, getLogChannel, getLogChannelId, getLogFields, getLogLevel, getNextClassNr, getObjectCopy, getObjectId, getObjectName, getObjectRevision, getObjectType, getOutputRowSets, getParent, getParentVariableSpace, getPartitionID, getPartitionTargets, getPreviewRowMeta, getProcessed, getRegistrationDate, getRemoteInputSteps, getRemoteOutputSteps, getRepartitioning, getRepositoryDirectory, getResultFiles, getRow, getRowFrom, getRowListeners, getRuntime, getServerSockets, getSlaveNr, getSocketRepository, getStatus, getStatusDescription, getStepDataInterface, getStepID, getStepListeners, getStepMeta, getStepMetaInterface, getStepname, getTrans, getTransMeta, getTypeId, getUniqueStepCountAcrossSlaves, getUniqueStepNrAcrossSlaves, getVariable, getVariable, identifyErrorOutput, incrementLinesInput, incrementLinesOutput, incrementLinesRead, incrementLinesRejected, incrementLinesSkipped, incrementLinesUpdated, incrementLinesWritten, initBeforeStart, initializeVariablesFrom, injectVariables, isBasic, isDebug, isDetailed, isDistributed, isInitialising, isMapping, isPartitioned, isPaused, isRowLevel, isRunning, isStopped, isUsingThreadPriorityManagment, listVariables, logBasic, logBasic, logDebug, logDebug, logDetailed, logDetailed, logError, logError, logError, logMinimal, logMinimal, logRowlevel, logRowlevel, logSummary, markStart, markStop, outputIsDone, pauseRunning, putError, putRow, putRowTo, removeRowListener, resumeRunning, rowsetInputSize, rowsetOutputSize, safeModeChecking, setCarteObjectId, setCopy, setDistributed, setErrorRowMeta, setErrors, setInputRowMeta, setInputRowSets, setInternalVariables, setLinesInput, setLinesOutput, setLinesRead, setLinesRejected, setLinesSkipped, setLinesUpdated, setLinesWritten, setLogLevel, setOutputDone, setOutputRowSets, setParentVariableSpace, setPartitioned, setPartitionID, setPartitionTargets, setPaused, setPaused, setPreviewRowMeta, setRepartitioning, setRunning, setServerSockets, setSocketRepository, setStepDataInterface, setStepListeners, setStepMeta, setStepMetaInterface, setStepname, setStopped, setTransMeta, setUsingThreadPriorityManagment, setVariable, shareVariablesWith, stopAll, stopRunning, stopRunning, toString
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface org.pentaho.di.trans.step.StepInterface
addRowListener, addStepListener, batchComplete, canProcessOneRow, cleanup, dispose, getCopy, getErrors, getInputRowSets, getLinesInput, getLinesOutput, getLinesRead, getLinesRejected, getLinesUpdated, getLinesWritten, getLogChannel, getOutputRowSets, getPartitionID, getProcessed, getResultFiles, getRow, getRowListeners, getRuntime, getStatus, getStepID, getStepMeta, getStepname, getTrans, identifyErrorOutput, initBeforeStart, isMapping, isPartitioned, isPaused, isRunning, isStopped, isUsingThreadPriorityManagment, markStart, markStop, pauseRunning, putRow, removeRowListener, resumeRunning, rowsetInputSize, rowsetOutputSize, setErrors, setLinesRejected, setOutputDone, setPartitioned, setPartitionID, setRepartitioning, setRunning, setStopped, setUsingThreadPriorityManagment, stopAll, stopRunning
 
Methods inherited from interface org.pentaho.di.core.variables.VariableSpace
copyVariablesFrom, environmentSubstitute, environmentSubstitute, getBooleanValueOfVariable, getParentVariableSpace, getVariable, getVariable, initializeVariablesFrom, injectVariables, listVariables, setParentVariableSpace, setVariable, shareVariablesWith
 

Constructor Detail

ReservoirSampling

public ReservoirSampling(StepMeta stepMeta,
                         StepDataInterface stepDataInterface,
                         int copyNr,
                         TransMeta transMeta,
                         Trans trans)
Creates a new ReservoirSampling instance.

Implements the reservoir sampling algorithm "R" by Jeffrey Scott Vitter. (algorithm is implemented in ReservoirSamplingData.java

For more information see:

Vitter, J. S. Random Sampling with a Reservoir. ACM Transactions on Mathematical Software, Vol. 11, No. 1, March 1985. Pages 37-57.

Parameters:
stepMeta - holds the step's meta data
stepDataInterface - holds the step's temporary data
copyNr - the number assigned to the step
transMeta - meta data for the transformation
trans - a Trans value
Method Detail

processRow

public boolean processRow(StepMetaInterface smi,
                          StepDataInterface sdi)
                   throws KettleException
Process an incoming row of data.

Specified by:
processRow in interface StepInterface
Overrides:
processRow in class BaseStep
Parameters:
smi - a StepMetaInterface value
sdi - a StepDataInterface value
Returns:
a boolean value
Throws:
KettleException - if an error occurs

init

public boolean init(StepMetaInterface smi,
                    StepDataInterface sdi)
Initialize the step.

Specified by:
init in interface StepInterface
Overrides:
init in class BaseStep
Parameters:
smi - a StepMetaInterface value
sdi - a StepDataInterface value
Returns:
a boolean value

run

public void run()
Run is where the action happens!