org.pentaho.di.trans.steps.univariatestats
Class UnivariateStats

java.lang.Object
  extended by org.pentaho.di.trans.step.BaseStep
      extended by org.pentaho.di.trans.steps.univariatestats.UnivariateStats
All Implemented Interfaces:
HasLogChannelInterface, LoggingObjectInterface, VariableSpace, StepInterface

public class UnivariateStats
extends BaseStep
implements StepInterface

Calculate univariate statistics based on one column of the input data.

Calculates N, mean, standard deviation, minimum, maximum, median and arbitrary percentiles. Percentiles can be calculated using interpolation or a simple method. See The Engineering Statistics Handbook for details.

Version:
1.0
Author:
Mark Hall (mhall{[at]}pentaho.org)

Field Summary
 
Fields inherited from class org.pentaho.di.trans.step.BaseStep
first, linesInput, linesOutput, linesRead, linesRejected, linesSkipped, linesUpdated, linesWritten, terminator, terminator_rows
 
Constructor Summary
UnivariateStats(StepMeta stepMeta, StepDataInterface stepDataInterface, int copyNr, TransMeta transMeta, Trans trans)
          Creates a new UnivariateStats instance.
 
Method Summary
 boolean init(StepMetaInterface smi, StepDataInterface sdi)
          Initialize the step.
 boolean processRow(StepMetaInterface smi, StepDataInterface sdi)
          Process an incoming row of data.
 void run()
          Run is where the action happens!
 
Methods inherited from class org.pentaho.di.trans.step.BaseStep
addResultFile, addRowListener, addStepListener, batchComplete, buildLog, canProcessOneRow, cleanup, closeQuietly, copyVariablesFrom, decrementLinesRead, decrementLinesWritten, dispatch, dispose, environmentSubstitute, environmentSubstitute, findInputRowSet, findInputRowSet, findOutputRowSet, findOutputRowSet, getBooleanValueOfVariable, getClusterSize, getContainerObjectId, getCopy, getDispatcher, getErrorRowMeta, getErrors, getFilename, getInputRowMeta, getInputRowSets, getLinesInput, getLinesOutput, getLinesRead, getLinesRejected, getLinesSkipped, getLinesUpdated, getLinesWritten, getLogChannel, getLogChannelId, getLogFields, getLogLevel, getNextClassNr, getObjectCopy, getObjectId, getObjectName, getObjectRevision, getObjectType, getOutputRowSets, getParent, getParentVariableSpace, getPartitionID, getPartitionTargets, getPreviewRowMeta, getProcessed, getRegistrationDate, getRemoteInputSteps, getRemoteOutputSteps, getRepartitioning, getRepositoryDirectory, getResultFiles, getRow, getRowFrom, getRowListeners, getRuntime, getServerSockets, getSlaveNr, getSocketRepository, getStatus, getStatusDescription, getStepDataInterface, getStepID, getStepListeners, getStepMeta, getStepMetaInterface, getStepname, getTrans, getTransMeta, getTypeId, getUniqueStepCountAcrossSlaves, getUniqueStepNrAcrossSlaves, getVariable, getVariable, identifyErrorOutput, incrementLinesInput, incrementLinesOutput, incrementLinesRead, incrementLinesRejected, incrementLinesSkipped, incrementLinesUpdated, incrementLinesWritten, initBeforeStart, initializeVariablesFrom, injectVariables, isBasic, isDebug, isDetailed, isDistributed, isInitialising, isMapping, isPartitioned, isPaused, isRowLevel, isRunning, isStopped, isUsingThreadPriorityManagment, listVariables, logBasic, logBasic, logDebug, logDebug, logDetailed, logDetailed, logError, logError, logError, logMinimal, logMinimal, logRowlevel, logRowlevel, logSummary, markStart, markStop, outputIsDone, pauseRunning, putError, putRow, putRowTo, removeRowListener, resumeRunning, rowsetInputSize, rowsetOutputSize, safeModeChecking, setCarteObjectId, setCopy, setDistributed, setErrorRowMeta, setErrors, setInputRowMeta, setInputRowSets, setInternalVariables, setLinesInput, setLinesOutput, setLinesRead, setLinesRejected, setLinesSkipped, setLinesUpdated, setLinesWritten, setLogLevel, setOutputDone, setOutputRowSets, setParentVariableSpace, setPartitioned, setPartitionID, setPartitionTargets, setPaused, setPaused, setPreviewRowMeta, setRepartitioning, setRunning, setServerSockets, setSocketRepository, setStepDataInterface, setStepListeners, setStepMeta, setStepMetaInterface, setStepname, setStopped, setTransMeta, setUsingThreadPriorityManagment, setVariable, shareVariablesWith, stopAll, stopRunning, stopRunning, toString
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface org.pentaho.di.trans.step.StepInterface
addRowListener, addStepListener, batchComplete, canProcessOneRow, cleanup, dispose, getCopy, getErrors, getInputRowSets, getLinesInput, getLinesOutput, getLinesRead, getLinesRejected, getLinesUpdated, getLinesWritten, getLogChannel, getOutputRowSets, getPartitionID, getProcessed, getResultFiles, getRow, getRowListeners, getRuntime, getStatus, getStepID, getStepMeta, getStepname, getTrans, identifyErrorOutput, initBeforeStart, isMapping, isPartitioned, isPaused, isRunning, isStopped, isUsingThreadPriorityManagment, markStart, markStop, pauseRunning, putRow, removeRowListener, resumeRunning, rowsetInputSize, rowsetOutputSize, setErrors, setLinesRejected, setOutputDone, setPartitioned, setPartitionID, setRepartitioning, setRunning, setStopped, setUsingThreadPriorityManagment, stopAll, stopRunning
 
Methods inherited from interface org.pentaho.di.core.variables.VariableSpace
copyVariablesFrom, environmentSubstitute, environmentSubstitute, getBooleanValueOfVariable, getParentVariableSpace, getVariable, getVariable, initializeVariablesFrom, injectVariables, listVariables, setParentVariableSpace, setVariable, shareVariablesWith
 

Constructor Detail

UnivariateStats

public UnivariateStats(StepMeta stepMeta,
                       StepDataInterface stepDataInterface,
                       int copyNr,
                       TransMeta transMeta,
                       Trans trans)
Creates a new UnivariateStats instance.

Parameters:
stepMeta - holds the step's meta data
stepDataInterface - holds the step's temporary data
copyNr - the number assigned to the step
transMeta - meta data for the transformation
trans - a Trans value
Method Detail

processRow

public boolean processRow(StepMetaInterface smi,
                          StepDataInterface sdi)
                   throws KettleException
Process an incoming row of data.

Specified by:
processRow in interface StepInterface
Overrides:
processRow in class BaseStep
Parameters:
smi - a StepMetaInterface value
sdi - a StepDataInterface value
Returns:
a boolean value
Throws:
KettleException - if an error occurs

init

public boolean init(StepMetaInterface smi,
                    StepDataInterface sdi)
Initialize the step.

Specified by:
init in interface StepInterface
Overrides:
init in class BaseStep
Parameters:
smi - a StepMetaInterface value
sdi - a StepDataInterface value
Returns:
a boolean value

run

public void run()
Run is where the action happens!