Package org.pentaho.di.arff
Class ArffOutputData
- java.lang.Object
-
- org.pentaho.di.trans.step.BaseStepData
-
- org.pentaho.di.arff.ArffOutputData
-
- All Implemented Interfaces:
org.pentaho.di.trans.step.StepDataInterface
public class ArffOutputData extends org.pentaho.di.trans.step.BaseStepData implements org.pentaho.di.trans.step.StepDataInterface
Holds temporary data and has routines for writing the ARFF file. This class writes rows to a temporary file while, at the same time, collects values for nominal attributes in an array of Maps. Once the last row has been processed, the ARFF header is written and then the temporary file is appended.- Version:
- 1.0
- Author:
- Mark Hall (mhall{[at]}pentaho.org)
-
-
Field Summary
Fields Modifier and Type Field Description protected org.pentaho.dm.commons.ArffMeta[]
m_arffMeta
protected OutputStream
m_dataOut
protected boolean
m_hasEncoding
protected File
m_headerFile
protected OutputStream
m_headerOut
protected byte[]
m_leftCurly
protected byte[]
m_missing
protected byte[]
m_newLine
protected Map<String,String>[]
m_nominalVals
protected int[]
m_outputFieldIndexes
protected org.pentaho.di.core.row.RowMetaInterface
m_outputRowMeta
protected boolean
m_outputSparseInstances
True if sparse data is to be outputprotected byte[]
m_rightCurly
protected byte[]
m_separator
protected byte[]
m_spaceLeftCurly
protected File
m_tempFile
protected int
m_weightFieldIndex
Index of the field used to set the weight for each instance (-1 means equal weights)
-
Constructor Summary
Constructors Constructor Description ArffOutputData()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
closeFiles()
Flush and close all filesvoid
finishOutput(String relationName, String encoding)
Writes the ARFF header and appends the temporary fileboolean
getHasEncoding()
Returns true if a specific character encoding is in use.org.pentaho.di.core.row.RowMetaInterface
getOutputRowMeta()
Get the meta data for the output formatvoid
openFiles(String filename)
Open files ready to write tovoid
setBinaryMissing(byte[] m)
Set the binary missing value to usevoid
setBinaryNewLine(byte[] nl)
Set the binary line terminator to usevoid
setBinarySeparator(byte[] s)
Set the binary separator to usevoid
setHasEncoding(boolean e)
Set whether an encoding is in use.void
setOutputFieldIndexes(int[] outputFieldIndexes, org.pentaho.dm.commons.ArffMeta[] arffMeta)
Set the indexes of the fields to output to the ARFF filevoid
setOutputRowMeta(org.pentaho.di.core.row.RowMetaInterface rmi)
Set the meta data for the output formatvoid
setOutputSparseInstances(boolean s)
Set whether to output instances in sparse formatvoid
setWeightFieldIndex(int index)
Set the index of the field whose values will be used to set the weight for each instance.void
writeRow(Object[] r, String encoding)
Convert and write a row of data to the ARFF file.-
Methods inherited from class org.pentaho.di.trans.step.BaseStepData
getStatus, isDisposed, isEmpty, isFinished, isIdle, isInitialising, isRunning, isStopped, setStatus
-
-
-
-
Field Detail
-
m_outputRowMeta
protected org.pentaho.di.core.row.RowMetaInterface m_outputRowMeta
-
m_outputFieldIndexes
protected int[] m_outputFieldIndexes
-
m_outputSparseInstances
protected boolean m_outputSparseInstances
True if sparse data is to be output
-
m_weightFieldIndex
protected int m_weightFieldIndex
Index of the field used to set the weight for each instance (-1 means equal weights)
-
m_arffMeta
protected org.pentaho.dm.commons.ArffMeta[] m_arffMeta
-
m_tempFile
protected File m_tempFile
-
m_headerFile
protected File m_headerFile
-
m_dataOut
protected OutputStream m_dataOut
-
m_headerOut
protected OutputStream m_headerOut
-
m_separator
protected byte[] m_separator
-
m_newLine
protected byte[] m_newLine
-
m_missing
protected byte[] m_missing
-
m_leftCurly
protected byte[] m_leftCurly
-
m_spaceLeftCurly
protected byte[] m_spaceLeftCurly
-
m_rightCurly
protected byte[] m_rightCurly
-
m_hasEncoding
protected boolean m_hasEncoding
-
-
Method Detail
-
getOutputRowMeta
public org.pentaho.di.core.row.RowMetaInterface getOutputRowMeta()
Get the meta data for the output format- Returns:
- a
RowMetaInterface
value
-
setOutputRowMeta
public void setOutputRowMeta(org.pentaho.di.core.row.RowMetaInterface rmi)
Set the meta data for the output format- Parameters:
rmi
- aRowMetaInterface
value
-
setHasEncoding
public void setHasEncoding(boolean e)
Set whether an encoding is in use.- Parameters:
e
- true if an encoding is in use
-
getHasEncoding
public boolean getHasEncoding()
Returns true if a specific character encoding is in use.- Returns:
- true if an encoding other than the default encoding is in use.
-
setBinaryNewLine
public void setBinaryNewLine(byte[] nl)
Set the binary line terminator to use- Parameters:
nl
- the line terminator
-
setBinarySeparator
public void setBinarySeparator(byte[] s)
Set the binary separator to use- Parameters:
s
- binary field separator
-
setBinaryMissing
public void setBinaryMissing(byte[] m)
Set the binary missing value to use- Parameters:
m
- binary missing value
-
setOutputFieldIndexes
public void setOutputFieldIndexes(int[] outputFieldIndexes, org.pentaho.dm.commons.ArffMeta[] arffMeta)
Set the indexes of the fields to output to the ARFF file- Parameters:
outputFieldIndexes
- array of indexesarffMeta
- array of arff metas
-
setWeightFieldIndex
public void setWeightFieldIndex(int index)
Set the index of the field whose values will be used to set the weight for each instance.- Parameters:
index
- the index of the field to use to set instance-level weights.
-
setOutputSparseInstances
public void setOutputSparseInstances(boolean s)
Set whether to output instances in sparse format- Parameters:
s
- true if instances are to be output in sparse format
-
openFiles
public void openFiles(String filename) throws IOException
Open files ready to write to- Parameters:
filename
- the name of the ARFF file to write to- Throws:
IOException
- if an error occurs
-
writeRow
public void writeRow(Object[] r, String encoding) throws IOException, org.pentaho.di.core.exception.KettleStepException
Convert and write a row of data to the ARFF file.- Parameters:
r
- the Kettle rowencoding
- an (optional) character encoding to use- Throws:
IOException
- if an error occursorg.pentaho.di.core.exception.KettleStepException
- if an error occurs
-
finishOutput
public void finishOutput(String relationName, String encoding) throws org.pentaho.di.core.exception.KettleStepException
Writes the ARFF header and appends the temporary file- Parameters:
relationName
- the ARFF relation nameencoding
- an (optional) character encoding- Throws:
org.pentaho.di.core.exception.KettleStepException
- if an error occurs
-
closeFiles
public void closeFiles() throws IOException
Flush and close all files- Throws:
IOException
- if an error occurs
-
-