Package org.pentaho.di.arff
Class ArffOutputData
java.lang.Object
org.pentaho.di.trans.step.BaseStepData
org.pentaho.di.arff.ArffOutputData
- All Implemented Interfaces:
org.pentaho.di.trans.step.StepDataInterface
public class ArffOutputData
extends org.pentaho.di.trans.step.BaseStepData
implements org.pentaho.di.trans.step.StepDataInterface
Holds temporary data and has routines for writing the ARFF file. This class
writes rows to a temporary file while, at the same time, collects values for
nominal attributes in an array of Maps. Once the last row has been processed,
the ARFF header is written and then the temporary file is appended.
- Version:
- 1.0
- Author:
- Mark Hall (mhall{[at]}pentaho.org)
-
Nested Class Summary
Nested classes/interfaces inherited from class org.pentaho.di.trans.step.BaseStepData
org.pentaho.di.trans.step.BaseStepData.StepExecutionStatus -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected org.pentaho.dm.commons.ArffMeta[]protected OutputStreamprotected booleanprotected Fileprotected OutputStreamprotected byte[]protected byte[]protected byte[]protected int[]protected org.pentaho.di.core.row.RowMetaInterfaceprotected booleanTrue if sparse data is to be outputprotected byte[]protected byte[]protected byte[]protected Fileprotected intIndex of the field used to set the weight for each instance (-1 means equal weights) -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidFlush and close all filesvoidfinishOutput(String relationName, String encoding) Writes the ARFF header and appends the temporary filebooleanReturns true if a specific character encoding is in use.org.pentaho.di.core.row.RowMetaInterfaceGet the meta data for the output formatvoidOpen files ready to write tovoidsetBinaryMissing(byte[] m) Set the binary missing value to usevoidsetBinaryNewLine(byte[] nl) Set the binary line terminator to usevoidsetBinarySeparator(byte[] s) Set the binary separator to usevoidsetHasEncoding(boolean e) Set whether an encoding is in use.voidsetOutputFieldIndexes(int[] outputFieldIndexes, org.pentaho.dm.commons.ArffMeta[] arffMeta) Set the indexes of the fields to output to the ARFF filevoidsetOutputRowMeta(org.pentaho.di.core.row.RowMetaInterface rmi) Set the meta data for the output formatvoidsetOutputSparseInstances(boolean s) Set whether to output instances in sparse formatvoidsetWeightFieldIndex(int index) Set the index of the field whose values will be used to set the weight for each instance.voidConvert and write a row of data to the ARFF file.Methods inherited from class org.pentaho.di.trans.step.BaseStepData
getStatus, isDisposed, isEmpty, isFinished, isIdle, isInitialising, isRunning, isStopped, setStatusMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.pentaho.di.trans.step.StepDataInterface
getStatus, isDisposed, isEmpty, isFinished, isIdle, isInitialising, isRunning, setStatus
-
Field Details
-
m_outputRowMeta
protected org.pentaho.di.core.row.RowMetaInterface m_outputRowMeta -
m_outputFieldIndexes
protected int[] m_outputFieldIndexes -
m_outputSparseInstances
protected boolean m_outputSparseInstancesTrue if sparse data is to be output -
m_weightFieldIndex
protected int m_weightFieldIndexIndex of the field used to set the weight for each instance (-1 means equal weights) -
m_arffMeta
protected org.pentaho.dm.commons.ArffMeta[] m_arffMeta -
m_nominalVals
-
m_tempFile
-
m_headerFile
-
m_dataOut
-
m_headerOut
-
m_separator
protected byte[] m_separator -
m_newLine
protected byte[] m_newLine -
m_missing
protected byte[] m_missing -
m_leftCurly
protected byte[] m_leftCurly -
m_spaceLeftCurly
protected byte[] m_spaceLeftCurly -
m_rightCurly
protected byte[] m_rightCurly -
m_hasEncoding
protected boolean m_hasEncoding
-
-
Constructor Details
-
ArffOutputData
public ArffOutputData()
-
-
Method Details
-
getOutputRowMeta
public org.pentaho.di.core.row.RowMetaInterface getOutputRowMeta()Get the meta data for the output format- Returns:
- a
RowMetaInterfacevalue
-
setOutputRowMeta
public void setOutputRowMeta(org.pentaho.di.core.row.RowMetaInterface rmi) Set the meta data for the output format- Parameters:
rmi- aRowMetaInterfacevalue
-
setHasEncoding
public void setHasEncoding(boolean e) Set whether an encoding is in use.- Parameters:
e- true if an encoding is in use
-
getHasEncoding
public boolean getHasEncoding()Returns true if a specific character encoding is in use.- Returns:
- true if an encoding other than the default encoding is in use.
-
setBinaryNewLine
public void setBinaryNewLine(byte[] nl) Set the binary line terminator to use- Parameters:
nl- the line terminator
-
setBinarySeparator
public void setBinarySeparator(byte[] s) Set the binary separator to use- Parameters:
s- binary field separator
-
setBinaryMissing
public void setBinaryMissing(byte[] m) Set the binary missing value to use- Parameters:
m- binary missing value
-
setOutputFieldIndexes
public void setOutputFieldIndexes(int[] outputFieldIndexes, org.pentaho.dm.commons.ArffMeta[] arffMeta) Set the indexes of the fields to output to the ARFF file- Parameters:
outputFieldIndexes- array of indexesarffMeta- array of arff metas
-
setWeightFieldIndex
public void setWeightFieldIndex(int index) Set the index of the field whose values will be used to set the weight for each instance.- Parameters:
index- the index of the field to use to set instance-level weights.
-
setOutputSparseInstances
public void setOutputSparseInstances(boolean s) Set whether to output instances in sparse format- Parameters:
s- true if instances are to be output in sparse format
-
openFiles
Open files ready to write to- Parameters:
filename- the name of the ARFF file to write to- Throws:
IOException- if an error occurs
-
writeRow
public void writeRow(Object[] r, String encoding) throws IOException, org.pentaho.di.core.exception.KettleStepException Convert and write a row of data to the ARFF file.- Parameters:
r- the Kettle rowencoding- an (optional) character encoding to use- Throws:
IOException- if an error occursorg.pentaho.di.core.exception.KettleStepException- if an error occurs
-
finishOutput
public void finishOutput(String relationName, String encoding) throws org.pentaho.di.core.exception.KettleStepException Writes the ARFF header and appends the temporary file- Parameters:
relationName- the ARFF relation nameencoding- an (optional) character encoding- Throws:
org.pentaho.di.core.exception.KettleStepException- if an error occurs
-
closeFiles
Flush and close all files- Throws:
IOException- if an error occurs
-