Class ArffOutputData

  • All Implemented Interfaces:
    org.pentaho.di.trans.step.StepDataInterface

    public class ArffOutputData
    extends org.pentaho.di.trans.step.BaseStepData
    implements org.pentaho.di.trans.step.StepDataInterface
    Holds temporary data and has routines for writing the ARFF file. This class writes rows to a temporary file while, at the same time, collects values for nominal attributes in an array of Maps. Once the last row has been processed, the ARFF header is written and then the temporary file is appended.
    Version:
    1.0
    Author:
    Mark Hall (mhall{[at]}pentaho.org)
    • Field Detail

      • m_outputRowMeta

        protected org.pentaho.di.core.row.RowMetaInterface m_outputRowMeta
      • m_outputFieldIndexes

        protected int[] m_outputFieldIndexes
      • m_outputSparseInstances

        protected boolean m_outputSparseInstances
        True if sparse data is to be output
      • m_weightFieldIndex

        protected int m_weightFieldIndex
        Index of the field used to set the weight for each instance (-1 means equal weights)
      • m_arffMeta

        protected org.pentaho.dm.commons.ArffMeta[] m_arffMeta
      • m_tempFile

        protected File m_tempFile
      • m_headerFile

        protected File m_headerFile
      • m_separator

        protected byte[] m_separator
      • m_newLine

        protected byte[] m_newLine
      • m_missing

        protected byte[] m_missing
      • m_leftCurly

        protected byte[] m_leftCurly
      • m_spaceLeftCurly

        protected byte[] m_spaceLeftCurly
      • m_rightCurly

        protected byte[] m_rightCurly
      • m_hasEncoding

        protected boolean m_hasEncoding
    • Constructor Detail

      • ArffOutputData

        public ArffOutputData()
    • Method Detail

      • getOutputRowMeta

        public org.pentaho.di.core.row.RowMetaInterface getOutputRowMeta()
        Get the meta data for the output format
        Returns:
        a RowMetaInterface value
      • setOutputRowMeta

        public void setOutputRowMeta​(org.pentaho.di.core.row.RowMetaInterface rmi)
        Set the meta data for the output format
        Parameters:
        rmi - a RowMetaInterface value
      • setHasEncoding

        public void setHasEncoding​(boolean e)
        Set whether an encoding is in use.
        Parameters:
        e - true if an encoding is in use
      • getHasEncoding

        public boolean getHasEncoding()
        Returns true if a specific character encoding is in use.
        Returns:
        true if an encoding other than the default encoding is in use.
      • setBinaryNewLine

        public void setBinaryNewLine​(byte[] nl)
        Set the binary line terminator to use
        Parameters:
        nl - the line terminator
      • setBinarySeparator

        public void setBinarySeparator​(byte[] s)
        Set the binary separator to use
        Parameters:
        s - binary field separator
      • setBinaryMissing

        public void setBinaryMissing​(byte[] m)
        Set the binary missing value to use
        Parameters:
        m - binary missing value
      • setOutputFieldIndexes

        public void setOutputFieldIndexes​(int[] outputFieldIndexes,
                                          org.pentaho.dm.commons.ArffMeta[] arffMeta)
        Set the indexes of the fields to output to the ARFF file
        Parameters:
        outputFieldIndexes - array of indexes
        arffMeta - array of arff metas
      • setWeightFieldIndex

        public void setWeightFieldIndex​(int index)
        Set the index of the field whose values will be used to set the weight for each instance.
        Parameters:
        index - the index of the field to use to set instance-level weights.
      • setOutputSparseInstances

        public void setOutputSparseInstances​(boolean s)
        Set whether to output instances in sparse format
        Parameters:
        s - true if instances are to be output in sparse format
      • openFiles

        public void openFiles​(String filename)
                       throws IOException
        Open files ready to write to
        Parameters:
        filename - the name of the ARFF file to write to
        Throws:
        IOException - if an error occurs
      • writeRow

        public void writeRow​(Object[] r,
                             String encoding)
                      throws IOException,
                             org.pentaho.di.core.exception.KettleStepException
        Convert and write a row of data to the ARFF file.
        Parameters:
        r - the Kettle row
        encoding - an (optional) character encoding to use
        Throws:
        IOException - if an error occurs
        org.pentaho.di.core.exception.KettleStepException - if an error occurs
      • finishOutput

        public void finishOutput​(String relationName,
                                 String encoding)
                          throws org.pentaho.di.core.exception.KettleStepException
        Writes the ARFF header and appends the temporary file
        Parameters:
        relationName - the ARFF relation name
        encoding - an (optional) character encoding
        Throws:
        org.pentaho.di.core.exception.KettleStepException - if an error occurs
      • closeFiles

        public void closeFiles()
                        throws IOException
        Flush and close all files
        Throws:
        IOException - if an error occurs