Class ReservoirSampling

  • All Implemented Interfaces:
    org.pentaho.di.core.ExtensionDataInterface, HasLogChannelInterface, org.pentaho.di.core.logging.LoggingObjectInterface, org.pentaho.di.core.logging.LoggingObjectLifecycleInterface, org.pentaho.di.core.variables.VariableSpace, StepInterface

    public class ReservoirSampling
    extends BaseStep
    implements StepInterface
    • Constructor Detail

      • ReservoirSampling

        public ReservoirSampling​(StepMeta stepMeta,
                                 StepDataInterface stepDataInterface,
                                 int copyNr,
                                 TransMeta transMeta,
                                 Trans trans)
        Creates a new ReservoirSampling instance.

        Implements the reservoir sampling algorithm "R" by Jeffrey Scott Vitter. (algorithm is implemented in ReservoirSamplingData.java

        For more information see:

        Vitter, J. S. Random Sampling with a Reservoir. ACM Transactions on Mathematical Software, Vol. 11, No. 1, March 1985. Pages 37-57.

        Parameters:
        stepMeta - holds the step's meta data
        stepDataInterface - holds the step's temporary data
        copyNr - the number assigned to the step
        transMeta - meta data for the transformation
        trans - a Trans value
    • Method Detail

      • processRow

        public boolean processRow​(StepMetaInterface smi,
                                  StepDataInterface sdi)
                           throws org.pentaho.di.core.exception.KettleException
        Process an incoming row of data.
        Specified by:
        processRow in interface StepInterface
        Overrides:
        processRow in class BaseStep
        Parameters:
        smi - a StepMetaInterface value
        sdi - a StepDataInterface value
        Returns:
        a boolean value
        Throws:
        org.pentaho.di.core.exception.KettleException - if an error occurs
      • run

        public void run()
        Run is where the action happens!