public class ReservoirSamplingData extends BaseStepData implements StepDataInterface
For more information see:
Vitter, J. S. Random Sampling with a Reservoir. ACM Transactions on Mathematical Software, Vol. 11, No. 1, March
1985. Pages 37-57.
Modifier and Type | Class and Description |
---|---|
static class |
ReservoirSamplingData.PROC_MODE |
BaseStepData.StepExecutionStatus
Modifier and Type | Field and Description |
---|---|
protected int |
m_currentRow |
protected int |
m_k |
protected RowMetaInterface |
m_outputRowMeta |
protected Random |
m_random |
protected List<Object[]> |
m_sample |
protected ReservoirSamplingData.PROC_MODE |
m_state |
Constructor and Description |
---|
ReservoirSamplingData() |
Modifier and Type | Method and Description |
---|---|
void |
cleanUp() |
RowMetaInterface |
getOutputRowMeta()
Get the output meta data
|
ReservoirSamplingData.PROC_MODE |
getProcessingMode()
Determine the current operational state of the Reservoir Sampling step.
|
List<Object[]> |
getSample()
Gets the sample as an array of rows
|
void |
initialize(int sampleSize,
int seed)
Initialize this data object
|
void |
processRow(Object[] row)
Here is where the action happens.
|
void |
setOutputRowMeta(RowMetaInterface rmi)
Set the meta data for the output format
|
void |
setProcessingMode(ReservoirSamplingData.PROC_MODE state)
Set this component to sample, pass through or be disabled
|
getStatus, isDisposed, isEmpty, isFinished, isIdle, isInitialising, isRunning, isStopped, setStatus
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getStatus, isDisposed, isEmpty, isFinished, isIdle, isInitialising, isRunning, setStatus
protected RowMetaInterface m_outputRowMeta
protected int m_k
protected int m_currentRow
protected Random m_random
protected ReservoirSamplingData.PROC_MODE m_state
public void setOutputRowMeta(RowMetaInterface rmi)
rmi
- a RowMetaInterface
valuepublic RowMetaInterface getOutputRowMeta()
RowMetaInterface
valuepublic List<Object[]> getSample()
public void initialize(int sampleSize, int seed)
sampleSize
- the number of rows to sampleseed
- the seed for the random number generatorpublic ReservoirSamplingData.PROC_MODE getProcessingMode()
public void setProcessingMode(ReservoirSamplingData.PROC_MODE state)
state
- member of PROC_MODE enumeration indicating the desired operational statepublic void processRow(Object[] row)
row
- an incoming rowpublic void cleanUp()
Copyright © 2018 Hitachi Vantara. All rights reserved.