Class ReservoirSampling
java.lang.Object
org.pentaho.di.trans.step.BaseStep
org.pentaho.di.trans.steps.reservoirsampling.ReservoirSampling
- All Implemented Interfaces:
org.pentaho.di.core.ExtensionDataInterface,HasLogChannelInterface,org.pentaho.di.core.logging.LoggingObjectInterface,org.pentaho.di.core.logging.LoggingObjectLifecycleInterface,org.pentaho.di.core.variables.VariableSpace,StepInterface
-
Field Summary
Fields inherited from class org.pentaho.di.trans.step.BaseStep
deadLockCounter, extensionDataMap, first, linesInput, linesOutput, linesRead, linesRejected, linesSkipped, linesUpdated, linesWritten, log, metaStore, repository, rowListeners, safeStopped, terminator, terminator_rows, variables -
Constructor Summary
ConstructorsConstructorDescriptionReservoirSampling(StepMeta stepMeta, StepDataInterface stepDataInterface, int copyNr, TransMeta transMeta, Trans trans) Creates a newReservoirSamplinginstance. -
Method Summary
Modifier and TypeMethodDescriptionbooleaninit(StepMetaInterface smi, StepDataInterface sdi) Initialize the step.booleanprocessRow(StepMetaInterface smi, StepDataInterface sdi) Process an incoming row of data.voidrun()Run is where the action happens!Methods inherited from class org.pentaho.di.trans.step.BaseStep
addResultFile, addRowListener, addRowSetToInputRowSets, addRowSetToOutputRowSets, addStepListener, batchComplete, beforeStartProcessing, buildLog, canProcessOneRow, checkFeedback, cleanup, clearInputRowSets, clearOutputRowSets, closeQuietly, copyVariablesFrom, decrementLinesRead, decrementLinesWritten, dispatch, dispose, environmentSubstitute, environmentSubstitute, environmentSubstitute, fieldSubstitute, findInputRowSet, findInputRowSet, findOutputRowSet, findOutputRowSet, getBooleanValueOfVariable, getClusterSize, getContainerObjectId, getCopy, getCurrentInputRowSetNr, getCurrentOutputRowSetNr, getDispatcher, getErrorRowMeta, getErrors, getExtensionDataMap, getFilename, getFirstInputRowSet, getInputRowMeta, getInputRowSets, getLinesInput, getLinesOutput, getLinesRead, getLinesRejected, getLinesSkipped, getLinesUpdated, getLinesWritten, getLogChannel, getLogChannelId, getLogFields, getLogLevel, getMetaStore, getNextClassNr, getObjectCopy, getObjectId, getObjectName, getObjectRevision, getObjectType, getOutputRowSets, getParent, getParentVariableSpace, getPartitionID, getPartitionTargets, getPreviewRowMeta, getProcessed, getRegistrationDate, getRemoteInputSteps, getRemoteOutputSteps, getRepartitioning, getRepository, getRepositoryDirectory, getResultFiles, getRow, getRowFrom, getRowHandler, getRowListeners, getRuntime, getServerSockets, getSlaveNr, getSocketRepository, getStatus, getStatusDescription, getStepDataInterface, getStepID, getStepListeners, getStepMeta, getStepMetaInterface, getStepname, getTrans, getTransMeta, getTypeId, getUniqueStepCountAcrossSlaves, getUniqueStepNrAcrossSlaves, getVariable, getVariable, handleGetRowFrom, handlePutRowTo, identifyErrorOutput, incrementLinesInput, incrementLinesOutput, incrementLinesRead, incrementLinesRejected, incrementLinesSkipped, incrementLinesUpdated, incrementLinesWritten, initBeforeStart, initializeVariablesFrom, injectVariables, isBasic, isDebug, isDetailed, isDistributed, isForcingSeparateLogging, isGatheringMetrics, isInitialising, isMapping, isPartitioned, isPaused, isRowLevel, isRunning, isSafeStopped, isStopped, isUsingThreadPriorityManagment, listVariables, logBasic, logBasic, logDebug, logDebug, logDetailed, logDetailed, logError, logError, logError, logMinimal, logMinimal, logRowlevel, logRowlevel, logSummary, markStart, markStop, openRemoteInputStepSocketsOnce, openRemoteOutputStepSocketsOnce, outputIsDone, pauseRunning, putError, putRow, putRowTo, removeRowListener, resumeRunning, rowsetInputSize, rowsetOutputSize, safeModeChecking, safeModeChecking, setCarteObjectId, setCopy, setCurrentInputRowSetNr, setCurrentOutputRowSetNr, setDistributed, setErrorRowMeta, setErrors, setForcingSeparateLogging, setGatheringMetrics, setInputRowMeta, setInputRowSets, setInternalVariables, setLinesInput, setLinesOutput, setLinesRead, setLinesRejected, setLinesSkipped, setLinesUpdated, setLinesWritten, setLogLevel, setMetaStore, setOutputDone, setOutputRowSets, setParentVariableSpace, setPartitioned, setPartitionID, setPartitionTargets, setPaused, setPaused, setPreviewRowMeta, setRepartitioning, setRepository, setRowHandler, setRunning, setSafeStopped, setServerSockets, setSocketRepository, setStepDataInterface, setStepListeners, setStepMeta, setStepMetaInterface, setStepname, setStopped, setTransMeta, setUsingThreadPriorityManagment, setVariable, shareVariablesWith, stopAll, stopRunning, stopRunning, swapFirstInputRowSetIfExists, toString, verifyInputDeadLock, waitUntilTransformationIsStartedMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface org.pentaho.di.core.logging.LoggingObjectLifecycleInterface
callAfterLog, callBeforeLogMethods inherited from interface org.pentaho.di.trans.step.StepInterface
addRowListener, addRowSetToInputRowSets, addRowSetToOutputRowSets, addStepListener, afterFinishProcessing, batchComplete, beforeStartProcessing, canProcessOneRow, cleanup, dispose, getCopy, getCurrentInputRowSetNr, getCurrentOutputRowSetNr, getErrors, getInputRowSets, getLinesInput, getLinesOutput, getLinesRead, getLinesRejected, getLinesUpdated, getLinesWritten, getLogChannel, getMetaStore, getOutputRowSets, getPartitionID, getProcessed, getRepository, getResultFiles, getRow, getRowListeners, getRuntime, getStatus, getStepID, getStepMeta, getStepname, getTrans, identifyErrorOutput, initBeforeStart, isMapping, isPartitioned, isPaused, isRunning, isSafeStopped, isStopped, isUsingThreadPriorityManagment, markStart, markStop, pauseRunning, putRow, removeRowListener, resumeRunning, rowsetInputSize, rowsetOutputSize, setCurrentInputRowSetNr, setCurrentOutputRowSetNr, setErrors, setLinesRejected, setMetaStore, setOutputDone, setPartitioned, setPartitionID, setRepartitioning, setRepository, setRunning, setSafeStopped, setStopped, setUsingThreadPriorityManagment, stopAll, stopRunning, subStatusesMethods inherited from interface org.pentaho.di.core.variables.VariableSpace
copyVariablesFrom, environmentSubstitute, environmentSubstitute, environmentSubstitute, fieldSubstitute, getBooleanValueOfVariable, getParentVariableSpace, getVariable, getVariable, initializeVariablesFrom, injectVariables, listVariables, setParentVariableSpace, setVariable, shareVariablesWith
-
Constructor Details
-
ReservoirSampling
public ReservoirSampling(StepMeta stepMeta, StepDataInterface stepDataInterface, int copyNr, TransMeta transMeta, Trans trans) Creates a newReservoirSamplinginstance.Implements the reservoir sampling algorithm "R" by Jeffrey Scott Vitter. (algorithm is implemented in ReservoirSamplingData.java
For more information see:
Vitter, J. S. Random Sampling with a Reservoir. ACM Transactions on Mathematical Software, Vol. 11, No. 1, March 1985. Pages 37-57.- Parameters:
stepMeta- holds the step's meta datastepDataInterface- holds the step's temporary datacopyNr- the number assigned to the steptransMeta- meta data for the transformationtrans- aTransvalue
-
-
Method Details
-
processRow
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws org.pentaho.di.core.exception.KettleException Process an incoming row of data.- Specified by:
processRowin interfaceStepInterface- Overrides:
processRowin classBaseStep- Parameters:
smi- aStepMetaInterfacevaluesdi- aStepDataInterfacevalue- Returns:
- a
booleanvalue - Throws:
org.pentaho.di.core.exception.KettleException- if an error occurs
-
init
Initialize the step.- Specified by:
initin interfaceStepInterface- Overrides:
initin classBaseStep- Parameters:
smi- aStepMetaInterfacevaluesdi- aStepDataInterfacevalue- Returns:
- a
booleanvalue
-
run
public void run()Run is where the action happens!
-