Package org.pentaho.di.arff
Class ArffOutputData
- java.lang.Object
-
- org.pentaho.di.trans.step.BaseStepData
-
- org.pentaho.di.arff.ArffOutputData
-
- All Implemented Interfaces:
org.pentaho.di.trans.step.StepDataInterface
public class ArffOutputData extends org.pentaho.di.trans.step.BaseStepData implements org.pentaho.di.trans.step.StepDataInterfaceHolds temporary data and has routines for writing the ARFF file. This class writes rows to a temporary file while, at the same time, collects values for nominal attributes in an array of Maps. Once the last row has been processed, the ARFF header is written and then the temporary file is appended.- Version:
- 1.0
- Author:
- Mark Hall (mhall{[at]}pentaho.org)
-
-
Field Summary
Fields Modifier and Type Field Description protected org.pentaho.dm.commons.ArffMeta[]m_arffMetaprotected OutputStreamm_dataOutprotected booleanm_hasEncodingprotected Filem_headerFileprotected OutputStreamm_headerOutprotected byte[]m_leftCurlyprotected byte[]m_missingprotected byte[]m_newLineprotected Map<String,String>[]m_nominalValsprotected int[]m_outputFieldIndexesprotected org.pentaho.di.core.row.RowMetaInterfacem_outputRowMetaprotected booleanm_outputSparseInstancesTrue if sparse data is to be outputprotected byte[]m_rightCurlyprotected byte[]m_separatorprotected byte[]m_spaceLeftCurlyprotected Filem_tempFileprotected intm_weightFieldIndexIndex of the field used to set the weight for each instance (-1 means equal weights)
-
Constructor Summary
Constructors Constructor Description ArffOutputData()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidcloseFiles()Flush and close all filesvoidfinishOutput(String relationName, String encoding)Writes the ARFF header and appends the temporary filebooleangetHasEncoding()Returns true if a specific character encoding is in use.org.pentaho.di.core.row.RowMetaInterfacegetOutputRowMeta()Get the meta data for the output formatvoidopenFiles(String filename)Open files ready to write tovoidsetBinaryMissing(byte[] m)Set the binary missing value to usevoidsetBinaryNewLine(byte[] nl)Set the binary line terminator to usevoidsetBinarySeparator(byte[] s)Set the binary separator to usevoidsetHasEncoding(boolean e)Set whether an encoding is in use.voidsetOutputFieldIndexes(int[] outputFieldIndexes, org.pentaho.dm.commons.ArffMeta[] arffMeta)Set the indexes of the fields to output to the ARFF filevoidsetOutputRowMeta(org.pentaho.di.core.row.RowMetaInterface rmi)Set the meta data for the output formatvoidsetOutputSparseInstances(boolean s)Set whether to output instances in sparse formatvoidsetWeightFieldIndex(int index)Set the index of the field whose values will be used to set the weight for each instance.voidwriteRow(Object[] r, String encoding)Convert and write a row of data to the ARFF file.-
Methods inherited from class org.pentaho.di.trans.step.BaseStepData
getStatus, isDisposed, isEmpty, isFinished, isIdle, isInitialising, isRunning, isStopped, setStatus
-
-
-
-
Field Detail
-
m_outputRowMeta
protected org.pentaho.di.core.row.RowMetaInterface m_outputRowMeta
-
m_outputFieldIndexes
protected int[] m_outputFieldIndexes
-
m_outputSparseInstances
protected boolean m_outputSparseInstances
True if sparse data is to be output
-
m_weightFieldIndex
protected int m_weightFieldIndex
Index of the field used to set the weight for each instance (-1 means equal weights)
-
m_arffMeta
protected org.pentaho.dm.commons.ArffMeta[] m_arffMeta
-
m_tempFile
protected File m_tempFile
-
m_headerFile
protected File m_headerFile
-
m_dataOut
protected OutputStream m_dataOut
-
m_headerOut
protected OutputStream m_headerOut
-
m_separator
protected byte[] m_separator
-
m_newLine
protected byte[] m_newLine
-
m_missing
protected byte[] m_missing
-
m_leftCurly
protected byte[] m_leftCurly
-
m_spaceLeftCurly
protected byte[] m_spaceLeftCurly
-
m_rightCurly
protected byte[] m_rightCurly
-
m_hasEncoding
protected boolean m_hasEncoding
-
-
Method Detail
-
getOutputRowMeta
public org.pentaho.di.core.row.RowMetaInterface getOutputRowMeta()
Get the meta data for the output format- Returns:
- a
RowMetaInterfacevalue
-
setOutputRowMeta
public void setOutputRowMeta(org.pentaho.di.core.row.RowMetaInterface rmi)
Set the meta data for the output format- Parameters:
rmi- aRowMetaInterfacevalue
-
setHasEncoding
public void setHasEncoding(boolean e)
Set whether an encoding is in use.- Parameters:
e- true if an encoding is in use
-
getHasEncoding
public boolean getHasEncoding()
Returns true if a specific character encoding is in use.- Returns:
- true if an encoding other than the default encoding is in use.
-
setBinaryNewLine
public void setBinaryNewLine(byte[] nl)
Set the binary line terminator to use- Parameters:
nl- the line terminator
-
setBinarySeparator
public void setBinarySeparator(byte[] s)
Set the binary separator to use- Parameters:
s- binary field separator
-
setBinaryMissing
public void setBinaryMissing(byte[] m)
Set the binary missing value to use- Parameters:
m- binary missing value
-
setOutputFieldIndexes
public void setOutputFieldIndexes(int[] outputFieldIndexes, org.pentaho.dm.commons.ArffMeta[] arffMeta)Set the indexes of the fields to output to the ARFF file- Parameters:
outputFieldIndexes- array of indexesarffMeta- array of arff metas
-
setWeightFieldIndex
public void setWeightFieldIndex(int index)
Set the index of the field whose values will be used to set the weight for each instance.- Parameters:
index- the index of the field to use to set instance-level weights.
-
setOutputSparseInstances
public void setOutputSparseInstances(boolean s)
Set whether to output instances in sparse format- Parameters:
s- true if instances are to be output in sparse format
-
openFiles
public void openFiles(String filename) throws IOException
Open files ready to write to- Parameters:
filename- the name of the ARFF file to write to- Throws:
IOException- if an error occurs
-
writeRow
public void writeRow(Object[] r, String encoding) throws IOException, org.pentaho.di.core.exception.KettleStepException
Convert and write a row of data to the ARFF file.- Parameters:
r- the Kettle rowencoding- an (optional) character encoding to use- Throws:
IOException- if an error occursorg.pentaho.di.core.exception.KettleStepException- if an error occurs
-
finishOutput
public void finishOutput(String relationName, String encoding) throws org.pentaho.di.core.exception.KettleStepException
Writes the ARFF header and appends the temporary file- Parameters:
relationName- the ARFF relation nameencoding- an (optional) character encoding- Throws:
org.pentaho.di.core.exception.KettleStepException- if an error occurs
-
closeFiles
public void closeFiles() throws IOExceptionFlush and close all files- Throws:
IOException- if an error occurs
-
-