public class ReservoirSampling extends BaseStep implements StepInterface
first, linesInput, linesOutput, linesRead, linesRejected, linesSkipped, linesUpdated, linesWritten, terminator, terminator_rows
Constructor and Description |
---|
ReservoirSampling(StepMeta stepMeta,
StepDataInterface stepDataInterface,
int copyNr,
TransMeta transMeta,
Trans trans)
Creates a new
ReservoirSampling instance. |
Modifier and Type | Method and Description |
---|---|
boolean |
init(StepMetaInterface smi,
StepDataInterface sdi)
Initialize the step.
|
boolean |
processRow(StepMetaInterface smi,
StepDataInterface sdi)
Process an incoming row of data.
|
void |
run()
Run is where the action happens!
|
addResultFile, addRowListener, addStepListener, batchComplete, buildLog, canProcessOneRow, cleanup, closeQuietly, copyVariablesFrom, decrementLinesRead, decrementLinesWritten, dispatch, dispose, environmentSubstitute, environmentSubstitute, fieldSubstitute, findInputRowSet, findInputRowSet, findOutputRowSet, findOutputRowSet, getBooleanValueOfVariable, getClusterSize, getContainerObjectId, getCopy, getCurrentInputRowSetNr, getCurrentOutputRowSetNr, getDispatcher, getErrorRowMeta, getErrors, getExtensionDataMap, getFilename, getInputRowMeta, getInputRowSets, getLinesInput, getLinesOutput, getLinesRead, getLinesRejected, getLinesSkipped, getLinesUpdated, getLinesWritten, getLogChannel, getLogChannelId, getLogFields, getLogLevel, getMetaStore, getNextClassNr, getObjectCopy, getObjectId, getObjectName, getObjectRevision, getObjectType, getOutputRowSets, getParent, getParentVariableSpace, getPartitionID, getPartitionTargets, getPreviewRowMeta, getProcessed, getRegistrationDate, getRemoteInputSteps, getRemoteOutputSteps, getRepartitioning, getRepository, getRepositoryDirectory, getResultFiles, getRow, getRowFrom, getRowListeners, getRuntime, getServerSockets, getSlaveNr, getSocketRepository, getStatus, getStatusDescription, getStepDataInterface, getStepID, getStepListeners, getStepMeta, getStepMetaInterface, getStepname, getTrans, getTransMeta, getTypeId, getUniqueStepCountAcrossSlaves, getUniqueStepNrAcrossSlaves, getVariable, getVariable, identifyErrorOutput, incrementLinesInput, incrementLinesOutput, incrementLinesRead, incrementLinesRejected, incrementLinesSkipped, incrementLinesUpdated, incrementLinesWritten, initBeforeStart, initializeVariablesFrom, injectVariables, isBasic, isDebug, isDetailed, isDistributed, isForcingSeparateLogging, isGatheringMetrics, isInitialising, isMapping, isPartitioned, isPaused, isRowLevel, isRunning, isStopped, isUsingThreadPriorityManagment, listVariables, logBasic, logBasic, logDebug, logDebug, logDetailed, logDetailed, logError, logError, logError, logMinimal, logMinimal, logRowlevel, logRowlevel, logSummary, markStart, markStop, outputIsDone, pauseRunning, putError, putRow, putRowTo, removeRowListener, resumeRunning, rowsetInputSize, rowsetOutputSize, safeModeChecking, setCarteObjectId, setCopy, setCurrentInputRowSetNr, setCurrentOutputRowSetNr, setDistributed, setErrorRowMeta, setErrors, setForcingSeparateLogging, setGatheringMetrics, setInputRowMeta, setInputRowSets, setInternalVariables, setLinesInput, setLinesOutput, setLinesRead, setLinesRejected, setLinesSkipped, setLinesUpdated, setLinesWritten, setLogLevel, setMetaStore, setOutputDone, setOutputRowSets, setParentVariableSpace, setPartitioned, setPartitionID, setPartitionTargets, setPaused, setPaused, setPreviewRowMeta, setRepartitioning, setRepository, setRunning, setServerSockets, setSocketRepository, setStepDataInterface, setStepListeners, setStepMeta, setStepMetaInterface, setStepname, setStopped, setTransMeta, setUsingThreadPriorityManagment, setVariable, shareVariablesWith, stopAll, stopRunning, stopRunning, toString
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
addRowListener, addStepListener, batchComplete, canProcessOneRow, cleanup, dispose, getCopy, getCurrentInputRowSetNr, getCurrentOutputRowSetNr, getErrors, getInputRowSets, getLinesInput, getLinesOutput, getLinesRead, getLinesRejected, getLinesUpdated, getLinesWritten, getLogChannel, getMetaStore, getOutputRowSets, getPartitionID, getProcessed, getRepository, getResultFiles, getRow, getRowListeners, getRuntime, getStatus, getStepID, getStepMeta, getStepname, getTrans, identifyErrorOutput, initBeforeStart, isMapping, isPartitioned, isPaused, isRunning, isStopped, isUsingThreadPriorityManagment, markStart, markStop, pauseRunning, putRow, removeRowListener, resumeRunning, rowsetInputSize, rowsetOutputSize, setCurrentInputRowSetNr, setCurrentOutputRowSetNr, setErrors, setLinesRejected, setMetaStore, setOutputDone, setPartitioned, setPartitionID, setRepartitioning, setRepository, setRunning, setStopped, setUsingThreadPriorityManagment, stopAll, stopRunning
copyVariablesFrom, environmentSubstitute, environmentSubstitute, fieldSubstitute, getBooleanValueOfVariable, getParentVariableSpace, getVariable, getVariable, initializeVariablesFrom, injectVariables, listVariables, setParentVariableSpace, setVariable, shareVariablesWith
public ReservoirSampling(StepMeta stepMeta, StepDataInterface stepDataInterface, int copyNr, TransMeta transMeta, Trans trans)
ReservoirSampling
instance. Implements the reservoir sampling algorithm "R" by Jeffrey Scott Vitter. (algorithm is implemented in ReservoirSamplingData.java
For more information see:
Vitter, J. S. Random Sampling with a Reservoir. ACM
Transactions on Mathematical Software, Vol. 11, No. 1,
March 1985. Pages 37-57.
stepMeta
- holds the step's meta datastepDataInterface
- holds the step's temporary datacopyNr
- the number assigned to the steptransMeta
- meta data for the transformationtrans
- a Trans
valuepublic boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws org.pentaho.di.core.exception.KettleException
processRow
in interface StepInterface
processRow
in class BaseStep
smi
- a StepMetaInterface
valuesdi
- a StepDataInterface
valueboolean
valueorg.pentaho.di.core.exception.KettleException
- if an error occurspublic boolean init(StepMetaInterface smi, StepDataInterface sdi)
init
in interface StepInterface
init
in class BaseStep
smi
- a StepMetaInterface
valuesdi
- a StepDataInterface
valueboolean
valuepublic void run()