Class ConnectionFileNameParser

java.lang.Object
org.apache.commons.vfs2.provider.AbstractFileNameParser
org.pentaho.di.connections.vfs.provider.ConnectionFileNameParser
All Implemented Interfaces:
org.apache.commons.vfs2.provider.FileNameParser

public class ConnectionFileNameParser extends org.apache.commons.vfs2.provider.AbstractFileNameParser
This class parses a PVFS URI. Parsed values are returned as ConnectionFileName instances.

The syntax of a PVFS URI is: pvfs://[(connection)/[(path)]].

Unlike standard URLs and URIs, PVFS URIs can contain special characters in the connection name (authority) section, without encoding. However, certain characters are still considered invalid in the connection name, CONNECTION_NAME_INVALID_CHARACTERS.

The path section is constituted by multiple segments separated by FileName.SEPARATOR: segment_1[/segment_2...[/segment_N[/]]], optionally terminated by a FileName.SEPARATOR character, to indicate representing a folder. Alternatively, path segments may be separated by the UriParser.TRANS_SEPARATOR character. Path segments cannot contain the characters in INVALID_CHARACTERS. Path segments can contain the percent character, "%", but only/always percent-encoded.

Parsing a URI validates its syntax and transforms it into canonical form:

  • validates the scheme is "pvfs" and is followed by "://"
  • UriParser.TRANS_SEPARATOR file path separators in the path sections are transformed into the canonical file path separator, FileName.SEPARATOR
  • empty path segments, e.g. "//", are removed
  • "." path segments are removed
  • ".." path segments are validated and resolved
  • a path with a trailing slash is recognized as a folder
  • any percent-encoded characters in path segments which are not encodeCharacter(char) reserved} are decoded

See SPECIAL_CHARACTERS for examples of special characters allowed (without encoding) in connection names and in path segments. Spaces and international characters are also supported.

See AbstractFileName#RESERVED_URI_CHARS for some more context on Apache VFS file names and reserved characters.

See Also:
  • Field Details

    • INVALID_CHARACTERS

      public static final String INVALID_CHARACTERS
      The characters which are invalid for both connection names and path segments.
      See Also:
    • CONNECTION_NAME_INVALID_CHARACTERS

      public static final String CONNECTION_NAME_INVALID_CHARACTERS
      The characters which are invalid for connection names.
      See Also:
    • CONNECTION_NAME_INVALID_CHARACTERS_PATTERN

      public static final Pattern CONNECTION_NAME_INVALID_CHARACTERS_PATTERN
      The pattern of invalid connection name characters.
    • SPECIAL_CHARACTERS

      public static final String SPECIAL_CHARACTERS
      Full set of special characters the connection name can be. Only excluding the characters: FileName.SEPARATOR, UriParser.TRANS_SEPARATOR and those which are reserved.

      Special characters in this context are characters that generally have to be encoded, otherwise the use of URI will throw URISyntaxException when during object instantiation or subsequent method calls.

      See Also:
  • Constructor Details

    • ConnectionFileNameParser

      public ConnectionFileNameParser()
    • ConnectionFileNameParser

      public ConnectionFileNameParser(@NonNull ConnectionFileNameUtils connectionFileNameUtils)
  • Method Details

    • encodeCharacter

      public final boolean encodeCharacter(char ch)
      Indicates if a character is reserved w.r.t. to percent-encoding, and is thus always encoded in canonical encoding form, as implemented by UriParser.canonicalizePath(StringBuilder, int, int, FileNameParser).

      Currently, only the "%" character — the URL-encoding escape character — is reserved.

      Specified by:
      encodeCharacter in interface org.apache.commons.vfs2.provider.FileNameParser
      Overrides:
      encodeCharacter in class org.apache.commons.vfs2.provider.AbstractFileNameParser
      Parameters:
      ch - The character to test.
      Returns:
      true if the character is reserved; false, otherwise.
    • getInstance

      @NonNull public static ConnectionFileNameParser getInstance()
    • getConnectionFileNameUtils

      @NonNull protected ConnectionFileNameUtils getConnectionFileNameUtils()
    • validateConnectionName

      public void validateConnectionName(@Nullable String connectionName) throws org.apache.commons.vfs2.FileSystemException
      Throws:
      org.apache.commons.vfs2.FileSystemException
    • isValidConnectionNameCharacter

      public boolean isValidConnectionNameCharacter(char c)
      Determines if a given character is a valid in a connection name.
      Parameters:
      c - The character to test.
      Returns:
      true if the character is valid; false, otherwise.
    • sanitizeConnectionName

      public String sanitizeConnectionName(String connectionName)
      Removes any invalid characters from a potential connection name.
      Parameters:
      connectionName - The potential connection name.
      Returns:
      A corresponding sanitized connection name.
    • parseUri

      public org.apache.commons.vfs2.FileName parseUri(org.apache.commons.vfs2.provider.VfsComponentContext vfsComponentContext, org.apache.commons.vfs2.FileName baseFileName, String pvfsUri) throws org.apache.commons.vfs2.FileSystemException
      Parses a PVFS URI under an Apache VFS context.
      Parameters:
      vfsComponentContext - The VFS component context.
      baseFileName - The base file name.
      pvfsUri - The PVFS URI to parse.
      Returns:
      The connection file name, never null.
      Throws:
      org.apache.commons.vfs2.FileSystemException - When the given PVFS URI is not valid.
    • parseUri

      @NonNull public ConnectionFileName parseUri(@NonNull String pvfsUri) throws org.apache.commons.vfs2.FileSystemException
      Parses a PVFS URI.
      Parameters:
      pvfsUri - The PVFS URI to parse.
      Returns:
      The corresponding connection file name.
      Throws:
      org.apache.commons.vfs2.FileSystemException - When the given PVFS URI is not valid.