Class BaseFileImportProcessor

java.lang.Object
org.pentaho.di.trans.steps.fileinput.text.BaseFileImportProcessor
Direct Known Subclasses:
CsvFileImportProcessor, TextFileCsvFileTypeImportProcessor

public abstract class BaseFileImportProcessor extends Object
BaseFileImportProcessor is an abstract class that provides a framework for processing csv file to extract file summary and fields data It defines methods for analyzing file data, converting lines to rows, and managing metadata. Subclasses must implement the abstract methods to provide specific functionality.
  • Field Details

    • meta

      protected Object meta
    • inputFields

      protected Object[] inputFields
    • samples

      protected int samples
    • showSummary

      protected boolean showSummary
    • replaceMeta

      protected boolean replaceMeta
    • message

      protected String message
    • debug

      protected String debug
    • rowNumber

      protected long rowNumber
    • reader

      protected BufferedInputStreamReader reader
    • transMeta

      protected TransMeta transMeta
    • log

      protected org.pentaho.di.core.logging.LogChannelInterface log
    • encodingType

      protected EncodingType encodingType
  • Constructor Details

    • BaseFileImportProcessor

      protected BaseFileImportProcessor(Object meta, TransMeta transMeta, BufferedInputStreamReader reader, int samples, boolean showSummary)
      Constructor for BaseFileImportProcessor.
      Parameters:
      meta - the metadata object
      transMeta - the transformation metadata
      reader - the file reader
      samples - the number of samples to process
      showSummary - whether to show a summary of the analysis
  • Method Details

    • getFieldCount

      protected abstract int getFieldCount()
      Returns the number of fields in the file.
      Returns:
      the field count
    • convertLineToRow

      protected abstract Object[] convertLineToRow(TextFileLine textFileLine, Object strinfo, org.pentaho.di.core.row.RowMetaInterface outputRowMeta, org.pentaho.di.core.row.RowMetaInterface convertRowMeta, boolean failOnParseError) throws org.pentaho.di.core.exception.KettleException
      Converts a line from the file into a row of data.
      Parameters:
      textFileLine - the line of text to convert
      strinfo - metadata information
      outputRowMeta - the output row metadata
      convertRowMeta - the converted row metadata
      failOnParseError - whether to fail on parse errors
      Returns:
      the converted row as an object array
      Throws:
      org.pentaho.di.core.exception.KettleException - if an error occurs during conversion
    • initializeField

      protected abstract void initializeField(Object field, DecimalFormatSymbols dfs)
      Initializes a field with the given decimal format symbols.
      Parameters:
      field - the field to initialize
      dfs - the decimal format symbols
    • setFieldTypeInfo

      protected abstract void setFieldTypeInfo(Object field, org.pentaho.di.core.util.StringEvaluator evaluator, List<org.pentaho.di.core.util.StringEvaluationResult> evaluationResults, org.pentaho.di.core.util.StringEvaluationResult strEvaluationResult)
      Sets type information for a field based on evaluation results.
      Parameters:
      field - the field to set type information for
      evaluator - the string evaluator
      evaluationResults - the list of evaluation results
      strEvaluationResult - the advised evaluation result
    • getFieldName

      protected abstract String getFieldName(Object field)
      Retrieves the name of a field.
      Parameters:
      field - the field to retrieve the name for
      Returns:
      the field name
    • getFieldTypeDesc

      protected abstract String getFieldTypeDesc(Object field)
      Retrieves the type description of a field.
      Parameters:
      field - the field to retrieve the type description for
      Returns:
      the field type description
    • getFieldType

      protected abstract int getFieldType(Object field)
      Retrieves the type of a field.
      Parameters:
      field - the field to retrieve the type for
      Returns:
      the field type
    • getFieldLength

      protected abstract int getFieldLength(Object field)
      Retrieves the length of a field.
      Parameters:
      field - the field to retrieve the length for
      Returns:
      the field length
    • getFieldPrecision

      protected abstract int getFieldPrecision(Object field)
      Retrieves the precision of a field.
      Parameters:
      field - the field to retrieve the precision for
      Returns:
      the field precision
    • getFieldFormat

      protected abstract String getFieldFormat(Object field)
      Retrieves the format of a field.
      Parameters:
      field - the field to retrieve the format for
      Returns:
      the field format
    • cloneMeta

      protected abstract Object cloneMeta()
      Clones the metadata object.
      Returns:
      the cloned metadata object
    • setAllFieldsToStringType

      protected abstract void setAllFieldsToStringType(Object meta)
      Sets all fields in the metadata to string type.
      Parameters:
      meta - the metadata object
    • getField

      protected abstract Object getField(Object meta, int index)
      Retrieves a field from the metadata at the specified index.
      Parameters:
      meta - the metadata object
      index - the index of the field
      Returns:
      the field object
    • hasHeader

      protected abstract boolean hasHeader()
      Checks if the file has a header.
      Returns:
      true if the file has a header, false otherwise
    • getHeaderLines

      protected abstract int getHeaderLines()
      Retrieves the number of header lines in the file.
      Returns:
      the number of header lines
    • getEnclosure

      protected abstract String getEnclosure()
      Retrieves the enclosure character used in the file.
      Returns:
      the enclosure character
    • getEscapeCharacter

      protected abstract String getEscapeCharacter()
      Retrieves the escape character used in the file.
      Returns:
      the escape character
    • getFileFormatTypeNr

      protected abstract int getFileFormatTypeNr()
      Retrieves the file format type number.
      Returns:
      the file format type number
    • getFields

      protected abstract void getFields(org.pentaho.di.core.row.RowMetaInterface rowMeta) throws org.pentaho.di.core.exception.KettleStepException
      Populates the row metadata with field information.
      Parameters:
      rowMeta - the row metadata to populate
      Throws:
      org.pentaho.di.core.exception.KettleStepException - if an error occurs while populating the metadata
    • convertFieldToDto

      protected abstract TextFileInputFieldDTO convertFieldToDto(Object field)
      Converts a field to a Data Transfer Object (DTO).
      Parameters:
      field - the field to convert
      Returns:
      the field DTO
    • analyzeFile

      public String analyzeFile(boolean failOnParseError) throws org.pentaho.di.core.exception.KettleException
      Analyzes the file and generates a summary message.
      Parameters:
      failOnParseError - whether to fail on parse errors
      Returns:
      the summary message
      Throws:
      org.pentaho.di.core.exception.KettleException - if an error occurs during analysis
    • getMessage

      public String getMessage()
      Retrieves the generated summary message.
      Returns:
      the summary message
    • getInputFields

      public Object[] getInputFields()
      Retrieves the input fields as an array of objects.
      Returns:
      the input fields
    • getInputFieldsDto

      public TextFileInputFieldDTO[] getInputFieldsDto()
      Retrieves the input fields as an array of Data Transfer Objects (DTOs).
      Returns:
      the input field DTOs