Class MultiMergeJoin

java.lang.Object
org.pentaho.di.trans.step.BaseStep
org.pentaho.di.trans.steps.multimerge.MultiMergeJoin
All Implemented Interfaces:
org.pentaho.di.core.ExtensionDataInterface, HasLogChannelInterface, org.pentaho.di.core.logging.LoggingObjectInterface, org.pentaho.di.core.logging.LoggingObjectLifecycleInterface, org.pentaho.di.core.variables.VariableSpace, StepInterface

public class MultiMergeJoin extends BaseStep implements StepInterface
Merge rows from 2 sorted streams and output joined rows with matched key fields. Use this instead of hash join is both your input streams are too big to fit in memory. Note that both the inputs must be sorted on the join key. This is a first prototype implementation that only handles two streams and inner join. It also always outputs all values from both streams. Ideally, we should: 1) Support any number of incoming streams 2) Allow user to choose the join type (inner, outer) for each stream 3) Allow user to choose which fields to push to next step 4) Have multiple output ports as follows: a) Containing matched records b) Unmatched records for each input port 5) Support incoming rows to be sorted either on ascending or descending order. The currently implementation only supports ascending
Since:
24-nov-2006
Author:
Biswapesh
  • Constructor Details

  • Method Details

    • processRow

      public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws org.pentaho.di.core.exception.KettleException
      Description copied from interface: StepInterface
      Perform the equivalent of processing one row. Typically this means reading a row from input (getRow()) and passing a row to output (putRow)).
      Specified by:
      processRow in interface StepInterface
      Overrides:
      processRow in class BaseStep
      Parameters:
      smi - The steps metadata to work with
      sdi - The steps temporary working data to work with (database connections, result sets, caches, temporary variables, etc.)
      Returns:
      false if no more rows can be processed or an error occurred.
      Throws:
      org.pentaho.di.core.exception.KettleException
    • init

      public boolean init(StepMetaInterface smi, StepDataInterface sdi)
      Description copied from interface: StepInterface
      Initialize and do work where other steps need to wait for...
      Specified by:
      init in interface StepInterface
      Overrides:
      init in class BaseStep
      Parameters:
      smi - The metadata to work with
      sdi - The data to initialize
      See Also:
    • isInputLayoutValid

      protected boolean isInputLayoutValid(org.pentaho.di.core.row.RowMetaInterface[] rows)
      Checks whether incoming rows are join compatible. This essentially means that the keys being compared should be of the same datatype and both rows should have the same number of keys specified
      Parameters:
      row1 - Reference row
      row2 - Row to compare to
      Returns:
      true when templates are compatible.