Class BaseFileImportProcessor
java.lang.Object
org.pentaho.di.trans.steps.fileinput.text.BaseFileImportProcessor
- Direct Known Subclasses:
CsvFileImportProcessor,TextFileCsvFileTypeImportProcessor
BaseFileImportProcessor is an abstract class that provides a framework for processing csv file to extract file
summary and fields data
It defines methods for analyzing file data, converting lines to rows, and managing metadata.
Subclasses must implement the abstract methods to provide specific functionality.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected Stringprotected EncodingTypeprotected Object[]protected org.pentaho.di.core.logging.LogChannelInterfaceprotected Stringprotected Objectprotected BufferedInputStreamReaderprotected booleanprotected longprotected intprotected booleanprotected TransMeta -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedBaseFileImportProcessor(Object meta, TransMeta transMeta, BufferedInputStreamReader reader, int samples, boolean showSummary) Constructor for BaseFileImportProcessor. -
Method Summary
Modifier and TypeMethodDescriptionanalyzeFile(boolean failOnParseError) Analyzes the file and generates a summary message.protected abstract ObjectClones the metadata object.protected abstract TextFileInputFieldDTOconvertFieldToDto(Object field) Converts a field to a Data Transfer Object (DTO).protected abstract Object[]convertLineToRow(TextFileLine textFileLine, Object strinfo, org.pentaho.di.core.row.RowMetaInterface outputRowMeta, org.pentaho.di.core.row.RowMetaInterface convertRowMeta, boolean failOnParseError) Converts a line from the file into a row of data.protected abstract StringRetrieves the enclosure character used in the file.protected abstract StringRetrieves the escape character used in the file.protected abstract ObjectRetrieves a field from the metadata at the specified index.protected abstract intReturns the number of fields in the file.protected abstract StringgetFieldFormat(Object field) Retrieves the format of a field.protected abstract intgetFieldLength(Object field) Retrieves the length of a field.protected abstract StringgetFieldName(Object field) Retrieves the name of a field.protected abstract intgetFieldPrecision(Object field) Retrieves the precision of a field.protected abstract voidgetFields(org.pentaho.di.core.row.RowMetaInterface rowMeta) Populates the row metadata with field information.protected abstract intgetFieldType(Object field) Retrieves the type of a field.protected abstract StringgetFieldTypeDesc(Object field) Retrieves the type description of a field.protected abstract intRetrieves the file format type number.protected abstract intRetrieves the number of header lines in the file.Object[]Retrieves the input fields as an array of objects.Retrieves the input fields as an array of Data Transfer Objects (DTOs).Retrieves the generated summary message.protected abstract booleanChecks if the file has a header.protected abstract voidinitializeField(Object field, DecimalFormatSymbols dfs) Initializes a field with the given decimal format symbols.protected abstract voidSets all fields in the metadata to string type.protected abstract voidsetFieldTypeInfo(Object field, org.pentaho.di.core.util.StringEvaluator evaluator, List<org.pentaho.di.core.util.StringEvaluationResult> evaluationResults, org.pentaho.di.core.util.StringEvaluationResult strEvaluationResult) Sets type information for a field based on evaluation results.
-
Field Details
-
meta
-
inputFields
-
samples
protected int samples -
showSummary
protected boolean showSummary -
replaceMeta
protected boolean replaceMeta -
message
-
debug
-
rowNumber
protected long rowNumber -
reader
-
transMeta
-
log
protected org.pentaho.di.core.logging.LogChannelInterface log -
encodingType
-
-
Constructor Details
-
BaseFileImportProcessor
protected BaseFileImportProcessor(Object meta, TransMeta transMeta, BufferedInputStreamReader reader, int samples, boolean showSummary) Constructor for BaseFileImportProcessor.- Parameters:
meta- the metadata objecttransMeta- the transformation metadatareader- the file readersamples- the number of samples to processshowSummary- whether to show a summary of the analysis
-
-
Method Details
-
getFieldCount
protected abstract int getFieldCount()Returns the number of fields in the file.- Returns:
- the field count
-
convertLineToRow
protected abstract Object[] convertLineToRow(TextFileLine textFileLine, Object strinfo, org.pentaho.di.core.row.RowMetaInterface outputRowMeta, org.pentaho.di.core.row.RowMetaInterface convertRowMeta, boolean failOnParseError) throws org.pentaho.di.core.exception.KettleException Converts a line from the file into a row of data.- Parameters:
textFileLine- the line of text to convertstrinfo- metadata informationoutputRowMeta- the output row metadataconvertRowMeta- the converted row metadatafailOnParseError- whether to fail on parse errors- Returns:
- the converted row as an object array
- Throws:
org.pentaho.di.core.exception.KettleException- if an error occurs during conversion
-
initializeField
Initializes a field with the given decimal format symbols.- Parameters:
field- the field to initializedfs- the decimal format symbols
-
setFieldTypeInfo
protected abstract void setFieldTypeInfo(Object field, org.pentaho.di.core.util.StringEvaluator evaluator, List<org.pentaho.di.core.util.StringEvaluationResult> evaluationResults, org.pentaho.di.core.util.StringEvaluationResult strEvaluationResult) Sets type information for a field based on evaluation results.- Parameters:
field- the field to set type information forevaluator- the string evaluatorevaluationResults- the list of evaluation resultsstrEvaluationResult- the advised evaluation result
-
getFieldName
Retrieves the name of a field.- Parameters:
field- the field to retrieve the name for- Returns:
- the field name
-
getFieldTypeDesc
Retrieves the type description of a field.- Parameters:
field- the field to retrieve the type description for- Returns:
- the field type description
-
getFieldType
Retrieves the type of a field.- Parameters:
field- the field to retrieve the type for- Returns:
- the field type
-
getFieldLength
Retrieves the length of a field.- Parameters:
field- the field to retrieve the length for- Returns:
- the field length
-
getFieldPrecision
Retrieves the precision of a field.- Parameters:
field- the field to retrieve the precision for- Returns:
- the field precision
-
getFieldFormat
Retrieves the format of a field.- Parameters:
field- the field to retrieve the format for- Returns:
- the field format
-
cloneMeta
Clones the metadata object.- Returns:
- the cloned metadata object
-
setAllFieldsToStringType
Sets all fields in the metadata to string type.- Parameters:
meta- the metadata object
-
getField
Retrieves a field from the metadata at the specified index.- Parameters:
meta- the metadata objectindex- the index of the field- Returns:
- the field object
-
hasHeader
protected abstract boolean hasHeader()Checks if the file has a header.- Returns:
- true if the file has a header, false otherwise
-
getHeaderLines
protected abstract int getHeaderLines()Retrieves the number of header lines in the file.- Returns:
- the number of header lines
-
getEnclosure
Retrieves the enclosure character used in the file.- Returns:
- the enclosure character
-
getEscapeCharacter
Retrieves the escape character used in the file.- Returns:
- the escape character
-
getFileFormatTypeNr
protected abstract int getFileFormatTypeNr()Retrieves the file format type number.- Returns:
- the file format type number
-
getFields
protected abstract void getFields(org.pentaho.di.core.row.RowMetaInterface rowMeta) throws org.pentaho.di.core.exception.KettleStepException Populates the row metadata with field information.- Parameters:
rowMeta- the row metadata to populate- Throws:
org.pentaho.di.core.exception.KettleStepException- if an error occurs while populating the metadata
-
convertFieldToDto
Converts a field to a Data Transfer Object (DTO).- Parameters:
field- the field to convert- Returns:
- the field DTO
-
analyzeFile
public String analyzeFile(boolean failOnParseError) throws org.pentaho.di.core.exception.KettleException Analyzes the file and generates a summary message.- Parameters:
failOnParseError- whether to fail on parse errors- Returns:
- the summary message
- Throws:
org.pentaho.di.core.exception.KettleException- if an error occurs during analysis
-
getMessage
Retrieves the generated summary message.- Returns:
- the summary message
-
getInputFields
Retrieves the input fields as an array of objects.- Returns:
- the input fields
-
getInputFieldsDto
Retrieves the input fields as an array of Data Transfer Objects (DTOs).- Returns:
- the input field DTOs
-