Class ReservoirSamplingData
- java.lang.Object
-
- org.pentaho.di.trans.step.BaseStepData
-
- org.pentaho.di.trans.steps.reservoirsampling.ReservoirSamplingData
-
- All Implemented Interfaces:
StepDataInterface
public class ReservoirSamplingData extends BaseStepData implements StepDataInterface
Holds temporary data (i.e. sampled rows). Implements the reservoir sampling algorithm "R" by Jeffrey Scott Vitter.For more information see:
Vitter, J. S. Random Sampling with a Reservoir. ACM Transactions on Mathematical Software, Vol. 11, No. 1, March 1985. Pages 37-57.- Version:
- 1.0
- Author:
- Mark Hall (mhall{[at]}pentaho.org)
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
ReservoirSamplingData.PROC_MODE
-
Nested classes/interfaces inherited from class org.pentaho.di.trans.step.BaseStepData
BaseStepData.StepExecutionStatus
-
-
Field Summary
Fields Modifier and Type Field Description protected int
m_currentRow
protected int
m_k
protected org.pentaho.di.core.row.RowMetaInterface
m_outputRowMeta
protected Random
m_random
protected List<Object[]>
m_sample
protected ReservoirSamplingData.PROC_MODE
m_state
-
Constructor Summary
Constructors Constructor Description ReservoirSamplingData()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
cleanUp()
org.pentaho.di.core.row.RowMetaInterface
getOutputRowMeta()
Get the output meta dataReservoirSamplingData.PROC_MODE
getProcessingMode()
Determine the current operational state of the Reservoir Sampling step.List<Object[]>
getSample()
Gets the sample as an array of rowsvoid
initialize(int sampleSize, int seed)
Initialize this data objectvoid
processRow(Object[] row)
Here is where the action happens.void
setOutputRowMeta(org.pentaho.di.core.row.RowMetaInterface rmi)
Set the meta data for the output formatvoid
setProcessingMode(ReservoirSamplingData.PROC_MODE state)
Set this component to sample, pass through or be disabled-
Methods inherited from class org.pentaho.di.trans.step.BaseStepData
getStatus, isDisposed, isEmpty, isFinished, isIdle, isInitialising, isRunning, isStopped, setStatus
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.pentaho.di.trans.step.StepDataInterface
getStatus, isDisposed, isEmpty, isFinished, isIdle, isInitialising, isRunning, setStatus
-
-
-
-
Field Detail
-
m_outputRowMeta
protected org.pentaho.di.core.row.RowMetaInterface m_outputRowMeta
-
m_k
protected int m_k
-
m_currentRow
protected int m_currentRow
-
m_random
protected Random m_random
-
m_state
protected ReservoirSamplingData.PROC_MODE m_state
-
-
Method Detail
-
setOutputRowMeta
public void setOutputRowMeta(org.pentaho.di.core.row.RowMetaInterface rmi)
Set the meta data for the output format- Parameters:
rmi
- aRowMetaInterface
value
-
getOutputRowMeta
public org.pentaho.di.core.row.RowMetaInterface getOutputRowMeta()
Get the output meta data- Returns:
- a
RowMetaInterface
value
-
getSample
public List<Object[]> getSample()
Gets the sample as an array of rows- Returns:
- the sampled rows
-
initialize
public void initialize(int sampleSize, int seed)
Initialize this data object- Parameters:
sampleSize
- the number of rows to sampleseed
- the seed for the random number generator
-
getProcessingMode
public ReservoirSamplingData.PROC_MODE getProcessingMode()
Determine the current operational state of the Reservoir Sampling step. Sampling, PassThrough(Do not wait until end, pass through on the fly), Disabled.- Returns:
- current operational state
-
setProcessingMode
public void setProcessingMode(ReservoirSamplingData.PROC_MODE state)
Set this component to sample, pass through or be disabled- Parameters:
state
- member of PROC_MODE enumeration indicating the desired operational state
-
processRow
public void processRow(Object[] row)
Here is where the action happens. Sampling is done using the "R" algorithm of Jeffrey Scott Vitter.- Parameters:
row
- an incoming row
-
cleanUp
public void cleanUp()
-
-