Class UnivariateStats
java.lang.Object
org.pentaho.di.trans.step.BaseStep
org.pentaho.di.trans.steps.univariatestats.UnivariateStats
- All Implemented Interfaces:
org.pentaho.di.core.ExtensionDataInterface,HasLogChannelInterface,org.pentaho.di.core.logging.LoggingObjectInterface,org.pentaho.di.core.logging.LoggingObjectLifecycleInterface,org.pentaho.di.core.variables.VariableSpace,StepInterface
Calculate univariate statistics based on one column of the input data.
Calculates N, mean, standard deviation, minimum, maximum, median and arbitrary percentiles. Percentiles can be calculated using interpolation or a simple method. See The Engineering Statistics Handbook for details.
- Version:
- 1.0
- Author:
- Mark Hall (mhall{[at]}pentaho.org)
-
Field Summary
Fields inherited from class org.pentaho.di.trans.step.BaseStep
deadLockCounter, extensionDataMap, first, linesInput, linesOutput, linesRead, linesRejected, linesSkipped, linesUpdated, linesWritten, log, metaStore, repository, rowListeners, safeStopped, terminator, terminator_rows, variables -
Constructor Summary
ConstructorsConstructorDescriptionUnivariateStats(StepMeta stepMeta, StepDataInterface stepDataInterface, int copyNr, TransMeta transMeta, Trans trans) Creates a newUnivariateStatsinstance. -
Method Summary
Modifier and TypeMethodDescriptionbooleaninit(StepMetaInterface smi, StepDataInterface sdi) Initialize the step.booleanprocessRow(StepMetaInterface smi, StepDataInterface sdi) Process an incoming row of data.Methods inherited from class org.pentaho.di.trans.step.BaseStep
addResultFile, addRowListener, addRowSetToInputRowSets, addRowSetToOutputRowSets, addStepListener, batchComplete, beforeStartProcessing, buildLog, canProcessOneRow, checkFeedback, cleanup, clearInputRowSets, clearOutputRowSets, closeQuietly, copyVariablesFrom, decrementLinesRead, decrementLinesWritten, dispatch, dispose, environmentSubstitute, environmentSubstitute, environmentSubstitute, fieldSubstitute, findInputRowSet, findInputRowSet, findOutputRowSet, findOutputRowSet, getBooleanValueOfVariable, getClusterSize, getContainerObjectId, getCopy, getCurrentInputRowSetNr, getCurrentOutputRowSetNr, getDispatcher, getErrorRowMeta, getErrors, getExtensionDataMap, getFilename, getFirstInputRowSet, getInputRowMeta, getInputRowSets, getLinesInput, getLinesOutput, getLinesRead, getLinesRejected, getLinesSkipped, getLinesUpdated, getLinesWritten, getLogChannel, getLogChannelId, getLogFields, getLogLevel, getMetaStore, getNextClassNr, getObjectCopy, getObjectId, getObjectName, getObjectRevision, getObjectType, getOutputRowSets, getParent, getParentVariableSpace, getPartitionID, getPartitionTargets, getPreviewRowMeta, getProcessed, getRegistrationDate, getRemoteInputSteps, getRemoteOutputSteps, getRepartitioning, getRepository, getRepositoryDirectory, getResultFiles, getRow, getRowFrom, getRowHandler, getRowListeners, getRuntime, getServerSockets, getSlaveNr, getSocketRepository, getStatus, getStatusDescription, getStepDataInterface, getStepID, getStepListeners, getStepMeta, getStepMetaInterface, getStepname, getTrans, getTransMeta, getTypeId, getUniqueStepCountAcrossSlaves, getUniqueStepNrAcrossSlaves, getVariable, getVariable, handleGetRowFrom, handlePutRowTo, identifyErrorOutput, incrementLinesInput, incrementLinesOutput, incrementLinesRead, incrementLinesRejected, incrementLinesSkipped, incrementLinesUpdated, incrementLinesWritten, initBeforeStart, initializeVariablesFrom, injectVariables, isBasic, isDebug, isDetailed, isDistributed, isForcingSeparateLogging, isGatheringMetrics, isInitialising, isMapping, isPartitioned, isPaused, isRowLevel, isRunning, isSafeStopped, isStopped, isUsingThreadPriorityManagment, listVariables, logBasic, logBasic, logDebug, logDebug, logDetailed, logDetailed, logError, logError, logError, logMinimal, logMinimal, logRowlevel, logRowlevel, logSummary, markStart, markStop, openRemoteInputStepSocketsOnce, openRemoteOutputStepSocketsOnce, outputIsDone, pauseRunning, putError, putRow, putRowTo, removeRowListener, resumeRunning, rowsetInputSize, rowsetOutputSize, safeModeChecking, safeModeChecking, setCarteObjectId, setCopy, setCurrentInputRowSetNr, setCurrentOutputRowSetNr, setDistributed, setErrorRowMeta, setErrors, setForcingSeparateLogging, setGatheringMetrics, setInputRowMeta, setInputRowSets, setInternalVariables, setLinesInput, setLinesOutput, setLinesRead, setLinesRejected, setLinesSkipped, setLinesUpdated, setLinesWritten, setLogLevel, setMetaStore, setOutputDone, setOutputRowSets, setParentVariableSpace, setPartitioned, setPartitionID, setPartitionTargets, setPaused, setPaused, setPreviewRowMeta, setRepartitioning, setRepository, setRowHandler, setRunning, setSafeStopped, setServerSockets, setSocketRepository, setStepDataInterface, setStepListeners, setStepMeta, setStepMetaInterface, setStepname, setStopped, setTransMeta, setUsingThreadPriorityManagment, setVariable, shareVariablesWith, stopAll, stopRunning, stopRunning, swapFirstInputRowSetIfExists, toString, verifyInputDeadLock, waitUntilTransformationIsStartedMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface org.pentaho.di.core.logging.LoggingObjectLifecycleInterface
callAfterLog, callBeforeLogMethods inherited from interface org.pentaho.di.trans.step.StepInterface
addRowListener, addRowSetToInputRowSets, addRowSetToOutputRowSets, addStepListener, afterFinishProcessing, batchComplete, beforeStartProcessing, canProcessOneRow, cleanup, dispose, getCopy, getCurrentInputRowSetNr, getCurrentOutputRowSetNr, getErrors, getInputRowSets, getLinesInput, getLinesOutput, getLinesRead, getLinesRejected, getLinesUpdated, getLinesWritten, getLogChannel, getMetaStore, getOutputRowSets, getPartitionID, getProcessed, getRepository, getResultFiles, getRow, getRowListeners, getRuntime, getStatus, getStepID, getStepMeta, getStepname, getTrans, identifyErrorOutput, initBeforeStart, isMapping, isPartitioned, isPaused, isRunning, isSafeStopped, isStopped, isUsingThreadPriorityManagment, markStart, markStop, pauseRunning, putRow, removeRowListener, resumeRunning, rowsetInputSize, rowsetOutputSize, setCurrentInputRowSetNr, setCurrentOutputRowSetNr, setErrors, setLinesRejected, setMetaStore, setOutputDone, setPartitioned, setPartitionID, setRepartitioning, setRepository, setRunning, setSafeStopped, setStopped, setUsingThreadPriorityManagment, stopAll, stopRunning, subStatusesMethods inherited from interface org.pentaho.di.core.variables.VariableSpace
copyVariablesFrom, environmentSubstitute, environmentSubstitute, environmentSubstitute, fieldSubstitute, getBooleanValueOfVariable, getParentVariableSpace, getVariable, getVariable, initializeVariablesFrom, injectVariables, listVariables, setParentVariableSpace, setVariable, shareVariablesWith
-
Constructor Details
-
UnivariateStats
public UnivariateStats(StepMeta stepMeta, StepDataInterface stepDataInterface, int copyNr, TransMeta transMeta, Trans trans) Creates a newUnivariateStatsinstance.- Parameters:
stepMeta- holds the step's meta datastepDataInterface- holds the step's temporary datacopyNr- the number assigned to the steptransMeta- meta data for the transformationtrans- aTransvalue
-
-
Method Details
-
processRow
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws org.pentaho.di.core.exception.KettleException Process an incoming row of data.- Specified by:
processRowin interfaceStepInterface- Overrides:
processRowin classBaseStep- Parameters:
smi- aStepMetaInterfacevaluesdi- aStepDataInterfacevalue- Returns:
- a
booleanvalue - Throws:
org.pentaho.di.core.exception.KettleException- if an error occurs
-
init
Initialize the step.- Specified by:
initin interfaceStepInterface- Overrides:
initin classBaseStep- Parameters:
smi- aStepMetaInterfacevaluesdi- aStepDataInterfacevalue- Returns:
- a
booleanvalue
-