API and Code

Below is the API documention for the the Stetl Python code.

Main Entry Points

There are several entry points through which Stetl can be called. The most common is to use the commandline script bin/stetl. This command should be available after doing an install.

In some contexts like integrations you may want to call Stetl via Python. The entries are then.

class stetl.etl.ETL(options_dict, args_dict=None)[source]

The main class: builds ETL Chains with connected Components from a config and let them run.

Usually this class is called via main but it may be called directly for direct integration.

env_expand_args_dict(args_dict, args_names)[source]

Expand values in dict with equivalent values from the OS Env. NB vars in OS Env should be prefixed with STETL_ or stetl_ as to get overrides by accident.

Returns:

expanded args_dict or None

Core Framework

The core framework is directly under the directory src/stetl. Below are the main seven classes. Their interrelation is as follows:

One or more stetl.chain.Chain objects are built from a Stetl ETL configuration via the stetl.factory.Factory class. A stetl.chain.Chain consists of a set of connected stetl.component.Component objects. A stetl.component.Component is either an stetl.input.Input, an stetl.output.Output or a stetl.filter.Filter. Data and status flows as stetl.packet.Packet objects from an stetl.input.Input via zero or more stetl.filter.Filter objects to a final stetl.output.Output.

As a trivial example: an stetl.input.Input could be an XML file, a stetl.filter.Filter could represent an XSLT file and an stetl.output.Output a PostGIS database. This is effected by specialized classes in the subpackages inputs, filters, and outputs. New in 1.1.0: stetl.Splitter to split data to multiple Outputs and stetl.Merger to combine multiple Inputs.

class stetl.factory.Factory[source]

Object and class Factory (Pattern). Based on: http://stackoverflow.com/questions/2226330/instantiate-a-python-class-from-a-name

class_forname(class_string)[source]

Returns class instance specified by a string.

Args:

class_string: The string representing a class.

Raises:

ValueError if module part of the class is not specified.

new_instance(class_obj, configdict, section)[source]

Returns object instance from class instance.

Args:

class_obj: object representing a class instance. args: standard args. kwargs: standard args.

class stetl.component.Component(configdict, section, consumes='none', produces='none')[source]

Abstract Base class for all Input, Filter and Output Components.

after_chain_invoke(packet)[source]

Called right after entire Component Chain invoke.

after_invoke(packet)[source]

Called right after Component invoke.

before_invoke(packet)[source]

Called just before Component invoke.

exit()[source]

Allows derived Components to perform a one-time exit/cleanup.

init()[source]

Allows derived Components to perform a one-time init.

input_format()[source]
CONFIG

The specific input format if the consumes parameter is a list or the format to be converted to the output_format.

  • type: str

  • required: False

  • default: None

invoke(packet)[source]

Components override for Component-specific behaviour, typically read, filter or write actions.

output_format()[source]
CONFIG

The specific output format if the produces parameter is a list or the format to which the input format is converted.

  • type: str

  • required: False

  • default: None

timer_stop(start_time)[source]

Collect and calculate per-Component performance timing stats. :param start_time: :return:

class stetl.component.Config(ptype=<class 'str'>, default=None, required=False)[source]

Decorator class to tie config values from the .ini file to object instance property values. Somewhat like the Python standard @property but with the possibility to define default values, typing and making properties required.

Each property is defined by @Config(type, default, required). Basic idea comes from: https://wiki.python.org/moin/PythonDecoratorLibrary#Cached_Properties

class stetl.chain.Chain(chain_str, config_dict)[source]

Holder for single invokable pipeline of components A Chain is basically a singly linked list of Components Each Component executes a part of the total ETL. Data along the Chain is passed within a Packet object. The compatibility of input and output for linked Components is checked when adding a Component to the Chain.

add(etl_comp)[source]

Add component to end of Chain :param etl_comp: :return:

assemble()[source]

Builder method: build a Chain of linked Components :return:

get_by_class(clazz)[source]

Get Component instance from Chain by class, mainly for testing. :param clazz: :return Component:

get_by_id(id)[source]

Get Component instance from Chain, mainly for testing. :param name: :return Component:

get_by_index(index)[source]

Get Component instance from Chain by position/index in Chain, mainly for testing. :param clazz: :return Component:

run()[source]

Run the ETL Chain. :return:

class stetl.packet.FORMAT[source]

Format of Packet (enumeration).

Current possible values:

  • ‘none’

  • ‘xml_line_stream’

  • ‘line_stream’

  • ‘etree_doc’

  • ‘etree_element’

  • ‘etree_feature_array’

  • ‘xml_doc_as_string’

  • ‘string’

  • ‘record’

  • ‘record_array’

  • ‘struct’

  • ‘geojson_feature’

  • ‘geojson_collection’

  • ‘gdal_vsi_path`

  • ‘ogr_feature’

  • ‘ogr_feature_array’

  • ‘any’

class stetl.packet.Packet(data=None)[source]

Represents units of (any) data and status passed along Chain of Components.

class stetl.input.Input(configdict, section, produces)[source]

Bases: Component

Abstract Base class for all Input Components.

invoke(packet)[source]

Components override for Component-specific behaviour, typically read, filter or write actions.

class stetl.output.Output(configdict, section, consumes)[source]

Bases: Component

Abstract Base class for all Output Components.

invoke(packet)[source]

Components override for Component-specific behaviour, typically read, filter or write actions.

class stetl.filter.Filter(configdict, section, consumes, produces)[source]

Bases: Component

Maps input to output. Abstract base class for specific Filters.

invoke(packet)[source]

Components override for Component-specific behaviour, typically read, filter or write actions.

class stetl.splitter.Splitter(config_dict, child_list)[source]

Bases: Component

Component that splits a single input to multiple output Components. Use this for example to produce multiple output file formats (GML, GeoJSON etc) or to publish to multiple remote services (SOS, SensorThings API) or for simple debugging: target Output and StandardOutput.

after_chain_invoke(packet)[source]

Called right after entire Component Chain invoke.

after_invoke(packet)[source]

Called right after Component invoke.

before_invoke(packet)[source]

Called just before Component invoke.

exit()[source]

Allows derived Components to perform a one-time exit/cleanup.

init()[source]

Allows derived Components to perform a one-time init.

invoke(packet)[source]

Components override for Component-specific behaviour, typically read, filter or write actions.

class stetl.merger.Merger(config_dict, child_list)[source]

Bases: Component

Component that merges multiple Input Components into a single Component. Use this for example to combine multiple input streams like API endpoints. The Merger will embed Child Components to which actions are delegated. A Child Component may be a sub-Chain e.g. (Input|Filter|Filter..) sequence. Hence the “next” should be coupled to the last Component in that sub-Chain with the degenerate case where the sub-Chain is a single (Input) Component. NB this Component can only be used for Inputs.

first(child)[source]

Get first Component in Child sub-Chain. :param child: :return: first Component

last(child)[source]

Get last Component in Child sub-Chain. :param child: :return: last Component

Components: Inputs

class stetl.inputs.dbinput.DbInput(configdict, section, produces)[source]

Bases: Input

Input from any database (abstract base class).

class stetl.inputs.dbinput.PostgresDbInput(configdict, section)[source]

Bases: SqlDbInput

Input by querying records from a Postgres database. Input is a query, like SELECT * from mytable. Output is zero or more records as record array (array of dict) or single record (dict).

produces=FORMAT.record_array (default) or FORMAT.record

exit()[source]

Allows derived Components to perform a one-time exit/cleanup.

host()[source]
CONFIG

host name or host IP-address, defaults to ‘localhost’

  • type: str

  • required: False

  • default: localhost

init()[source]

Allows derived Components to perform a one-time init.

password()[source]
CONFIG

User password, defaults to ‘postgres’

  • type: str

  • required: False

  • default: postgres

port()[source]
CONFIG

port for host, defaults to ‘5432’

  • type: str

  • required: False

  • default: 5432

raw_query(query_str)[source]

Performs DB-specific query and returns raw records iterator.

schema()[source]
CONFIG

The postgres schema name, defaults to ‘public’

  • type: str

  • required: False

  • default: public

user()[source]
CONFIG

User name, defaults to ‘postgres’

  • type: str

  • required: False

  • default: postgres

class stetl.inputs.dbinput.SqlDbInput(configdict, section)[source]

Bases: DbInput

Input using a query from any SQL-based RDBMS (abstract base class).

column_names()[source]
CONFIG

Column names to populate records with. If empty taken from table metadata.

  • type: str

  • required: False

  • default: None

database_name()[source]
CONFIG

Database name

  • type: str

  • required: True

  • default: None

do_query(query_str)[source]

DB-neutral query returning Python record list.

query()[source]
CONFIG

The query (string) to fire.

  • type: str

  • required: False

  • default: None

raw_query(query_str)[source]

Performs DB-specific query and returns raw records iterator.

read_once()[source]
CONFIG

Read once? i.e. only do query once and stop

  • type: bool

  • required: False

  • default: False

result_to_output(db_tuples)[source]

Convert DB-specific record tuples to single Python record (dict) or record array (list of dict).

table()[source]
CONFIG

Table name

  • type: str

  • required: False

  • default: None

tuples_to_records(db_tuples, columns=None)[source]

Convert tuple array (list of tuple) to list of records (list of dict’s) using list of column names.

class stetl.inputs.dbinput.SqliteDbInput(configdict, section)[source]

Bases: SqlDbInput

Input by querying records from a SQLite database. Input is a query, like SELECT * from mytable. Output is zero or more records as record array (array of dict) or single record (dict).

produces=FORMAT.record_array (default) or FORMAT.record

init()[source]

Allows derived Components to perform a one-time init.

raw_query(query_str)[source]

Performs DB-specific query and returns raw records iterator.

class stetl.inputs.httpinput.ApacheDirInput(configdict, section, produces='record')[source]

Bases: HttpInput

Read file data from an Apache directory “index” HTML page. Uses http://stackoverflow.com/questions/686147/url-tree-walker-in-python produces=FORMAT.record. Each record contains file_name and file_data (other meta data like date time is too fragile over different Apache servers).

file_ext()[source]
CONFIG

The file extension for target files in Apache dir.

  • type: str

  • required: False

  • default: xml

filter_file(file_name)[source]

Filter the file_name, e.g. to suppress reading, default: return file_name.

Parameters:

file_name

Return string or None:

init()[source]

Read the list of files from the Apache index URL.

next_file()[source]

Return a tuple (name, date, size) with next file info.

Return tuple:

no_more_files()[source]

More files left?.

Return Boolean:

read(packet)[source]

Read the data from the URL.

Parameters:

packet

Returns:

class stetl.inputs.httpinput.HttpInput(configdict, section, produces='any')[source]

Bases: Input

Fetch data from remote services like WFS via HTTP protocol. Base class: subclasses will do datatype-specific formatting of the returned data.

produces=FORMAT.any

add_authorization(request)[source]

Add authorization from config data. Authorization scheme-specific. May be extended or overloaded for additional schemes.

Parameters:

request – the HTTP Request

Returns:

auth()[source]
CONFIG

Authentication data: Flat JSON-like struct dependent on auth type/schema. Only the type field is required, other fields depend on auth schema. Supported values :

type: basic|token

If the type is basic (HTTP Basic Authentication) two additional fields user and password are required. If the type is token (HTTP Token) additional two additional fields keyword and token are required.

Any required Base64 encoding is provided by HttpInput.

Examples:

# Basic Auth
url = https://some.rest.api.com
auth = {
    type: basic,
    user: myname
    password: mypassword
}

# Token Auth
url = https://some.rest.api.com
auth = {
    type: token,
    keyword: Bearer
    token: mytoken
}
  • type: dict

  • required: False

  • default: None

format_data(data)[source]

Format response data, override in subclasses, defaults to returning original data. :param packet: :return:

parameters()[source]
CONFIG

Flat JSON-like struct of the parameters to be appended to the url.

Example: (parameters require quotes):

url = http://geodata.nationaalgeoregister.nl/natura2000/wfs
parameters = {
    service : WFS,
    version : 1.1.0,
    request : GetFeature,
    srsName : EPSG:28992,
    outputFormat : text/xml; subtype=gml/2.1.2,
    typename : natura2000
}
  • type: dict

  • required: False

  • default: None

read(packet)[source]

Read the data from the URL.

Parameters:

packet

Returns:

read_from_url(url, parameters=None)[source]

Read the data from the URL.

Parameters:
  • url – the url to fetch

  • parameters – optional dict of query parameters

Returns:

url()[source]
CONFIG

The HTTP URL string.

  • type: str

  • required: True

  • default: None

class stetl.inputs.ogrinput.OgrInput(configdict, section)[source]

Bases: Input

Direct GDAL OGR input via Python OGR wrapper. Via the Python API http://gdal.org/python an OGR data source is accessed and from each layer the Features are read. Each Layer corresponds to a “doc”, so for multi-layer sources the ‘end-of-doc’ flag is set after a Layer has been read.

This input can read almost any geospatial dataformat. One can use the features directly in a Stetl Filter or use a converter to e.g. convert to GeoJSON structures.

produces=FORMAT.ogr_feature or FORMAT.ogr_feature_array (all features)

data_source()[source]
CONFIG

String denoting the OGR datasource. Usually a path to a file like “path/rivers.shp” or connection string to PostgreSQL like “PG: host=localhost dbname=’rivers’ user=’postgres’”.

  • type: str

  • required: True

  • default: None

init()[source]

Allows derived Components to perform a one-time init.

source_format()[source]
CONFIG

Instructs GDAL to use driver by that name to open datasource. Not required for many standard formats that are self-describing like ESRI Shapefile.

Examples: ‘PostgreSQL’, ‘GeoJSON’ etc

  • type: str

  • required: False

  • default: None

source_options()[source]
CONFIG

Custom datasource-specific options. Used in gdal.SetConfigOption().

  • type: dict

  • required: False

  • default: None

sql()[source]
CONFIG

String with SQL query. Mandatory for PostgreSQL OGR source.

  • type: str

  • required: False

  • default: None

class stetl.inputs.ogrinput.OgrPostgisInput(configdict, section)[source]

Bases: Input

Input from PostGIS via ogr2ogr command. For now hardcoded to produce an ogr GML line stream. OgrInput may be a better alternative.

Alternatives: either stetl.input.PostgresqlInput or stetl.input.OgrInput.

produces=FORMAT.xml_line_stream

in_pg_db()[source]
CONFIG

Database name input DB.

  • type: str

  • required: True

  • default: None

in_pg_host()[source]
CONFIG

Host of input DB.

  • type: str

  • required: False

  • default: localhost

in_pg_password()[source]
CONFIG

Password input DB.

  • type: str

  • required: False

  • default: postgres

in_pg_port()[source]
CONFIG

Port of input DB.

  • type: str

  • required: False

  • default: 5432

in_pg_schema()[source]
CONFIG

DB Schema name input DB.

  • type: str

  • required: False

  • default: None

in_pg_sql()[source]
CONFIG

The input query (string) to fire.

  • type: str

  • required: False

  • default: None

in_pg_user()[source]
CONFIG

User input DB.

  • type: str

  • required: False

  • default: postgres

in_srs()[source]
CONFIG

SRS (projection) (ogr2ogr -s_srs) input DB e.g. ‘EPSG:28992’.

  • type: str

  • required: False

  • default: None

init()[source]

Allows derived Components to perform a one-time init.

out_dimension()[source]
CONFIG

Dimension (OGR: DIM=N) of features in output stream.

  • type: str

  • required: False

  • default: 2

out_geotype()[source]
CONFIG

OGR Geometry type new layer in output stream, e.g. POINT.

  • type: str

  • required: False

  • default: None

out_gml_format()[source]
CONFIG

GML format OGR name in output stream, e.g. ‘GML3’.

  • type: str

  • required: False

  • default: None

out_layer_name()[source]
CONFIG

New Layer name (ogr2ogr -nln) output stream, e.g. ‘address’.

  • type: str

  • required: False

  • default: None

out_srs()[source]
CONFIG

Target SRS (ogr2ogr -t_srs) code output stream.

  • type: str

  • required: False

  • default: None

Components: Filters

class stetl.filters.xsltfilter.XsltFilter(configdict, section)[source]

Bases: Filter

Invokes XSLT processor (via lxml) for given XSLT script on an etree doc.

consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc

invoke(packet)[source]

Components override for Component-specific behaviour, typically read, filter or write actions.

script()[source]
CONFIG

Path to XSLT script file.

  • type: str

  • required: True

  • default: None

class stetl.filters.xmlassembler.XmlAssembler(configdict, section)[source]

Bases: Filter

Split a stream of etree DOM XML elements (usually Features) into etree DOM docs. Consumes and buffers elements until max_elements reached, will then produce an etree doc.

consumes=FORMAT.etree_element, produces=FORMAT.etree_doc

invoke(packet)[source]

Components override for Component-specific behaviour, typically read, filter or write actions.

class stetl.filters.xmlelementreader.XmlElementReader(configdict, section)[source]

Bases: Filter

Extracts XML elements from a file, outputs each feature element in Packet. Parsing is streaming (no internal DOM buildup) so any file size can be handled. Use this class for your big GML files!

consumes=FORMAT.string, produces=FORMAT.etree_element

element_tags()[source]
CONFIG

Comma-separated string of XML (feature) element tag names of the elements that should be extracted and added to the output element stream.

  • type: list

  • required: True

  • default: None

invoke(packet)[source]

Components override for Component-specific behaviour, typically read, filter or write actions.

strip_namespaces()[source]
CONFIG

should namespaces be removed from the input document and thus not be present in the output element stream?

  • type: bool

  • required: False

  • default: False

class stetl.filters.xmlvalidator.XmlSchemaValidator(configdict, section)[source]

Bases: Filter

Validates an etree doc and prints result to log.

consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc

invoke(packet)[source]

Components override for Component-specific behaviour, typically read, filter or write actions.

class stetl.filters.sieve.AttrValueRecordSieve(configdict, section)[source]

Bases: Sieve

Sieves by attr/value(s) in Record Packets.

attr_name()[source]
CONFIG

Name of attribute whose value(s) are to be sieved.

  • type: str

  • required: True

  • default: None

attr_values()[source]
CONFIG

Value(s) for attribute to be to sieved. If empty any value is passed through (existence of attr_name is criterium).

  • type: list

  • required: False

  • default: []

sieve(packet)[source]

Filter out Packets that are not matching designated attr value(s). :param packet: :return:

class stetl.filters.sieve.Sieve(configdict, section, consumes, produces)[source]

Bases: Filter

ABC for specific Sieves that pass-through, “sieve”, Packets based on criteria in their data.

invoke(packet)[source]

Components override for Component-specific behaviour, typically read, filter or write actions.

sieve(packet)[source]

To be implemented in subclasses. :param packet: :return:

class stetl.filters.stringfilter.StringConcatFilter(configdict, section)[source]

Bases: StringFilter

Concatenates a specified string with the input string (packet.data) either/or as prefix (prepend) or postfix (append) and outputs that concatenation.

consumes=FORMAT.string, produces=FORMAT.string

append_string()[source]
CONFIG

String to be appended.

Example: append_string = /002PND150904.xml

  • type: str

  • required: False

  • default: None

prepend_string()[source]
CONFIG

String to be prepended.

Example: prepend_string = /vsizip/

  • type: str

  • required: False

  • default: None

class stetl.filters.stringfilter.StringFilter(configdict, section, consumes, produces)[source]

Bases: Filter

Base class for any string filtering

invoke(packet)[source]

Components override for Component-specific behaviour, typically read, filter or write actions.

class stetl.filters.stringfilter.StringSubstitutionFilter(configdict, section)[source]

Bases: StringFilter

String filtering using Python advanced String formatting. String should have substitutable values like {schema} {foo} format_args should be of the form format_args = schema:test foo:bar …

consumes=FORMAT.string, produces=FORMAT.string

format_args()[source]
CONFIG

Provides a list of format arguments used by the string substitution filter. Formatting of content according to Python String.format(). String should have substitutable values like {schema} {foo}.

Example: format_args = schema:test foo:bar

  • type: str

  • required: True

  • default: None

separator()[source]
CONFIG

Provides the separator to split the format argument names from their values.

  • type: str

  • required: False

  • default: :

class stetl.filters.templatingfilter.Jinja2TemplatingFilter(configdict, section)[source]

Bases: TemplatingFilter

Implements Templating using Jinja2. Jinja2 http://jinja.pocoo.org, is a modern and designer-friendly templating language for Python modelled after Django’s templates. A ‘struct’ format as input provides a tree-like structure that could originate from a JSON file or REST service. This input struct provides all the variables to be inserted into the template. The template itself can be configured in this component as a Jinja2 string or -file. An optional ‘template_search_paths’ provides a list of directories from which templates can be fethced. Default is the current working directory. Via the optional ‘globals_path’ a JSON structure can be inserted into the Template environment. The variables in this globals struture are typically “boilerplate” constants like: id-prefixes, point of contacts etc.

consumes=FORMAT.struct, produces=FORMAT.string

add_env_filters(jinja2_env)[source]

Register additional Filters on the template environment by updating the filters dict: Somehow min and max of list are not present so add them as well.

create_template()[source]

To be overridden in subclasses.

static geojson2gml_filter(value, source_crs=4326, target_crs=None, gml_id=None, gml_format='GML2', gml_longsrs='NO')[source]

Jinja2 custom Filter: generates any GML geometry from a GeoJSON geometry. By specifying a target_crs we can even reproject from the source CRS. The gml_format=GML2|GML3 determines the general GML form: e.g. pos/posList or coordinates. gml_longsrs=YES|NO determines the srsName format like EPSG:4326 or urn:ogc:def:crs:EPSG::4326 (long).

template_globals_path()[source]
CONFIG

One or more JSON files or URLs with global variables that can be used anywhere in template. Multiple files will be merged into one globals dictionary

  • type: str

  • required: False

  • default: None

template_search_paths()[source]
CONFIG

List of directories where to search for templates, default is current working directory only.

  • type: str

  • required: False

  • default: None

class stetl.filters.templatingfilter.StringTemplatingFilter(configdict, section)[source]

Bases: TemplatingFilter

Implements Templating using Python’s internal string.Template. A template string or file should be configured. The input record contains the actual values to be substituted in the template string as a record (key/value pairs). Output is a regular string.

consumes=FORMAT.record or FORMAT.record_array, produces=FORMAT.string

create_template()[source]

To be overridden in subclasses.

safe_substitution()[source]
CONFIG

Apply safe substitution? With this method, string.Template.safe_substitute will be invoked, instead of string.Template.substitute. If placeholders are missing from mapping and keywords, instead of raising an exception, the original placeholder will appear in the resulting string intact.

  • type: bool

  • required: False

  • default: False

class stetl.filters.templatingfilter.TemplatingFilter(configdict, section, consumes='any', produces='string')[source]

Bases: Filter

Abstract base class for specific template-based filters. See https://wiki.python.org/moin/Templating Subclasses implement a specific template language like Python string.Template, Mako, Genshi, Jinja2,

consumes=FORMAT.any, produces=FORMAT.string

create_template()[source]

To be overridden in subclasses.

exit()[source]

Allows derived Components to perform a one-time exit/cleanup.

init()[source]

Allows derived Components to perform a one-time init.

invoke(packet)[source]

Components override for Component-specific behaviour, typically read, filter or write actions.

template_file()[source]
CONFIG

Path to template file. One of template_file or template_string needs to be configured.

  • type: str

  • required: False

  • default: None

template_string()[source]
CONFIG

Template string. One of template_file or template_string needs to be configured.

  • type: str

  • required: False

  • default: None

class stetl.filters.gmlfeatureextractor.GmlFeatureExtractor(configdict, section='gml_feature_extractor')[source]

Bases: Filter

Extract arrays of GML features etree elements from etree docs.

consumes=FORMAT.etree_doc, produces=FORMAT.etree_feature_array

invoke(packet)[source]

Components override for Component-specific behaviour, typically read, filter or write actions.

class stetl.filters.formatconverter.FormatConverter(configdict, section)[source]

Bases: Filter

Converts (almost) any packet format (if converter available).

consumes=FORMAT.any, produces=FORMAT.any but actual formats are changed at initialization based on the input to output format to be converted via the input_format and output_format config parameters.

converter_args()[source]
CONFIG

Custom converter-specific arguments.

  • type: dict

  • required: False

  • default: None

static etree_doc2geojson_collection(packet, converter_args=None)[source]

Use converter_args to determine XML tag names for features and GeoJSON feature id. For example

converter_args = {

‘root_tag’: ‘FeatureCollection’, ‘feature_tag’: ‘featureMember’, ‘feature_id_attr’: ‘fid’ }

Parameters:
  • packet

  • converter_args

Returns:

static etree_doc2struct(packet, strip_space=True, strip_ns=True, sub=False, attr_prefix='', gml2ogr=True, ogr2json=True)[source]
Parameters:
  • packet

  • strip_space

  • strip_ns

  • sub

  • attr_prefix

  • gml2ogr

  • ogr2json

Returns:

static etree_elem2geojson_feature(packet, converter_args=None)[source]
static etree_elem2struct(packet, strip_space=True, strip_ns=True, sub=False, attr_prefix='', gml2ogr=True, ogr2json=True)[source]
Parameters:
  • packet

  • strip_space

  • strip_ns

  • sub

  • attr_prefix

  • gml2ogr

  • ogr2json

Returns:

init()[source]

Allows derived Components to perform a one-time init.

invoke(packet)[source]

Components override for Component-specific behaviour, typically read, filter or write actions.

class stetl.filters.execfilter.CommandExecFilter(configdict, section)[source]

Bases: ExecFilter

Executes an arbitrary command and captures the output

consumes=FORMAT.string, produces=FORMAT.string

invoke(packet)[source]

Components override for Component-specific behaviour, typically read, filter or write actions.

class stetl.filters.execfilter.ExecFilter(configdict, section, consumes, produces)[source]

Bases: Filter

Executes any command (abstract base class).

env_args()[source]
CONFIG

Provides of list of environment variables which will be used when executing the given command.

Example: env_args = pgpassword=postgres othersetting=value~with~spaces

  • type: str

  • required: False

  • default:

env_separator()[source]
CONFIG

Provides the separator to split the environment variable names from their values.

  • type: str

  • required: False

  • default: =

invoke(packet)[source]

Components override for Component-specific behaviour, typically read, filter or write actions.

class stetl.filters.nullfilter.NullFilter(configdict, section, consumes='any', produces='any')[source]

Bases: Filter

Pass-through Filter, does nothing. Mainly used in Test Cases.

invoke(packet)[source]

Components override for Component-specific behaviour, typically read, filter or write actions.

class stetl.filters.packetbuffer.PacketBuffer(configdict, section)[source]

Bases: Filter

Buffers all incoming Packets, main use is unit-testing to inspect Packets after ETL is done.

invoke(packet)[source]

Components override for Component-specific behaviour, typically read, filter or write actions.

class stetl.filters.packetwriter.PacketWriter(configdict, section)[source]

Bases: Filter

Writes the payload of a packet as a string to a file.

consumes=FORMAT.any, produces=FORMAT.string

file_path()[source]
CONFIG

File path to write content to.

  • type: str

  • required: True

  • default: None

invoke(packet)[source]

Components override for Component-specific behaviour, typically read, filter or write actions.

class stetl.filters.regexfilter.RegexFilter(configdict, section, consumes='string', produces='record')[source]

Bases: Filter

Extracts data from a string using a regular expression and returns the named groups as a record. consumes=FORMAT.string, produces=FORMAT.record

exit()[source]

Allows derived Components to perform a one-time exit/cleanup.

init()[source]

Allows derived Components to perform a one-time init.

invoke(packet)[source]

Components override for Component-specific behaviour, typically read, filter or write actions.

pattern_string()[source]
CONFIG

Regex pattern string. Should contain named groups.

  • type: str

  • required: True

  • default: None

class stetl.filters.fileextractor.FileExtractor(configdict, section, consumes='any', produces='string')[source]

Bases: Filter

Abstract Base Class. Extracts a file an archive and saves as the configured file name.

consumes=FORMAT.any, produces=FORMAT.string

after_chain_invoke(packet)[source]

Called right after entire Component Chain invoke.

buffer_size()[source]
CONFIG

Buffer size for read buffer during extraction.

  • type: int

  • required: False

  • default: 1073741824

delete_file()[source]
CONFIG

Delete the file when the chain has been completed?

  • type: bool

  • required: False

  • default: True

file_path()[source]
CONFIG

File name to write the extracted file to.

  • type: str

  • required: True

  • default: None

invoke(packet)[source]

Components override for Component-specific behaviour, typically read, filter or write actions.

class stetl.filters.fileextractor.VsiFileExtractor(configdict, section)[source]

Bases: FileExtractor

Extracts a file from a GDAL /vsi path spec, and saves it as the given file name.

Example paths: /vsizip/{/project/nlextract/data/BAG-2.0/BAGNLDL-08112020.zip}/9999STA08112020.zip’ /vsizip/{/vsizip/{BAGGEM0221L-15022021.zip}/GEM-WPL-RELATIE-15022021.zip}/GEM-WPL-RELATIE-15022021-000001.xml

See also stetl.inputs.fileinput.VsiZipFileInput that generates these paths.

Author: Just van den Broecke

consumes=FORMAT.gdal_vsi_path, produces=FORMAT.string

class stetl.filters.fileextractor.ZipFileExtractor(configdict, section)[source]

Bases: FileExtractor

Extracts a file from a ZIP file, and saves it as the given file name. Author: Frank Steggink

consumes=FORMAT.record, produces=FORMAT.string

class stetl.filters.vsifilter.VsiFilter(configdict, section, vsiname)[source]

Bases: Filter

Abstract base class for applying a GDAL/OGR virtual file system (VSI) filter.

invoke(packet)[source]

Components override for Component-specific behaviour, typically read, filter or write actions.

class stetl.filters.vsifilter.VsiZipFilter(configdict, section)[source]

Bases: VsiFilter

Applies a VSIZIP filter to the input record.

consumes=FORMAT.record, produces=FORMAT.string

NB: stetl.filters.zipfileextractor is deprecated. ZipFileExtractor is now part of module stetl.filters.fileextractor.

Components: Outputs

class stetl.outputs.fileoutput.FileOutput(configdict, section)[source]

Bases: Output

Pretty print input to file. Input may be an etree doc or any other stringify-able input.

consumes=FORMAT.any

file_path()[source]
CONFIG

Path to file, for MultiFileOutput can be of the form like: gmlcities-%03d.gml

  • type: str

  • required: True

  • default: None

class stetl.outputs.fileoutput.MultiFileOutput(configdict, section)[source]

Bases: FileOutput

Print to multiple files from subsequent packets like strings or etree docs, file_path must be of a form like: gmlcities-%03d.gml.

consumes=FORMAT.any

class stetl.outputs.standardoutput.StandardOutput(configdict, section)[source]

Bases: Output

Print any input to standard output.

consumes=FORMAT.any

class stetl.outputs.standardoutput.StandardXmlOutput(configdict, section)[source]

Bases: Output

Pretty print XML from etree doc to standard output. OBSOLETE, can be done with StandardOutput

consumes=FORMAT.etree_doc

class stetl.outputs.ogroutput.Ogr2OgrOutput(configdict, section)[source]

Bases: Output

Output from GML etree doc to any OGR2OGR output using the GDAL/OGR ogr2ogr command

consumes=FORMAT.etree_doc

class stetl.outputs.ogroutput.OgrOutput(configdict, section)[source]

Bases: Output

Direct GDAL OGR output via Python OGR wrapper. Via the Python API http://gdal.org/python OGR Features are written.

This output can write almost any geospatial, OGR-defined, dataformat.

consumes=FORMAT.ogr_feature or FORMAT.ogr_feature_array

always_apply_lco()[source]
CONFIG

Flag to indicate whether the layer creation options should be applied to all runs.

  • type: bool

  • required: False

  • default: False

append()[source]
CONFIG

Add to destination destination if it extists (ogr2ogr -append option).

  • type: bool

  • required: False

  • default: False

dest_create_options()[source]
CONFIG

Creation options.

Examples: ..

  • type: list

  • required: False

  • default: []

dest_data_source()[source]
CONFIG

String denoting the OGR data destination. Usually a path to a file like “path/rivers.shp” or connection string to PostgreSQL like “PG: host=localhost dbname=’rivers’ user=’postgres’”.

  • type: str

  • required: True

  • default: None

dest_format()[source]
CONFIG

Instructs GDAL to use driver by that name to open data destination. Not required for many standard formats that are self-describing like ESRI Shapefile.

Examples: ‘PostgreSQL’, ‘GeoJSON’ etc

  • type: str

  • required: False

  • default: None

dest_options()[source]
CONFIG

Custom data destination-specific options. Used in gdal.SetConfigOption().

  • type: dict

  • required: False

  • default: None

init()[source]

Allows derived Components to perform a one-time init.

layer_create_options()[source]
CONFIG

Options for newly created layer (-lco).

  • type: list

  • required: False

  • default: []

new_layer_name()[source]
CONFIG

Layer name for layer created in the destination source.

  • type: str

  • required: True

  • default: None

overwrite()[source]
CONFIG

Overwrite destination if it extists (ogr2ogr -overwrite option).

  • type: bool

  • required: False

  • default: False

sql()[source]
CONFIG

String with SQL query. Mandatory for PostgreSQL OGR dest.

  • type: str

  • required: False

  • default: None

target_srs()[source]
CONFIG

SRS (projection) for the target.

  • type: str

  • required: False

  • default: None

class stetl.outputs.execoutput.CommandExecOutput(configdict, section)[source]

Bases: ExecOutput

Executes an arbitrary command.

consumes=FORMAT.string

class stetl.outputs.execoutput.ExecOutput(configdict, section, consumes)[source]

Bases: Output

Executes any command (abstract base class).

env_args()[source]
CONFIG

Provides of list of environment variables which will be used when executing the given command.

Example: env_args = pgpassword=postgres othersetting=value~with~spaces

  • type: str

  • required: False

  • default:

env_separator()[source]
CONFIG

Provides the separator to split the environment variable names from their values.

  • type: str

  • required: False

  • default: =

class stetl.outputs.execoutput.Ogr2OgrExecOutput(configdict, section)[source]

Bases: ExecOutput

Executes an Ogr2Ogr command. Input is a file name to be processed. Output by calling Ogr2Ogr command.

consumes=FORMAT.string

always_apply_lco()[source]
CONFIG

Flag to indicate whether the layer creation options should be applied to all runs.

  • type: bool

  • required: False

  • default: False

cleanup_input()[source]
CONFIG

Flag to indicate whether the input file to ogr2ogr should be cleaned up.

  • type: bool

  • required: False

  • default: False

dest_data_source()[source]
CONFIG

String denoting the OGR data destination. Usually a path to a file like “path/rivers.shp” or connection string to PostgreSQL like “PG: host=localhost dbname=’rivers’ user=’postgres’”.

  • type: str

  • required: True

  • default: None

dest_format()[source]
CONFIG

Instructs GDAL to use driver by that name to open data destination. Not required for many standard formats that are self-describing like ESRI Shapefile.

Examples: ‘PostgreSQL’, ‘GeoJSON’ etc

  • type: str

  • required: False

  • default: None

gfs_template()[source]
CONFIG

Name of GFS template file to use during loading. Passed to ogr2ogr as –config GML_GFS_TEMPLATE <name>

  • type: str

  • required: False

  • default: None

lco()[source]
CONFIG

Options for newly created layer (-lco).

  • type: str

  • required: False

  • default: None

options()[source]
CONFIG

Miscellaneous options to pass to ogr2ogr.

  • type: str

  • required: False

  • default: None

spatial_extent()[source]
CONFIG

Spatial extent (-spat), to pass as xmin ymin xmax ymax

  • type: str

  • required: False

  • default: None

class stetl.outputs.dboutput.DbOutput(configdict, section, consumes)[source]

Bases: Output

Output to any database (abstract base class).

class stetl.outputs.dboutput.PostgresDbOutput(configdict, section)[source]

Bases: DbOutput

Output to PostgreSQL database. Input is an SQL string. Output by executing input SQL string.

consumes=FORMAT.string

database()[source]
CONFIG

Database name.

  • type: str

  • required: True

  • default: None

host()[source]
CONFIG

Hostname for DB.

  • type: str

  • required: False

  • default: None

password()[source]
CONFIG

DB Password for user.

  • type: str

  • required: False

  • default: None

schema()[source]
CONFIG

Postgres schema name for DB.

  • type: str

  • required: False

  • default: public

user()[source]
CONFIG

DB User name.

  • type: str

  • required: False

  • default: None

class stetl.outputs.dboutput.PostgresInsertOutput(configdict, section, consumes='record')[source]

Bases: PostgresDbOutput

Output by inserting a single record in a Postgres database table. Input is a Stetl record (Python dict structure) or a list of records. Creates an INSERT for Postgres to insert each single record. When the “replace” parameter is True, any existing record keyed by “key” is attempted to be UPDATEd first.

NB a constraint is that the first and each subsequent each record needs to contain all values as an INSERT and UPDATE query template is built once for the columns in the first record.

consumes=[FORMAT.record_array, FORMAT.record]

exit()[source]

Allows derived Components to perform a one-time exit/cleanup.

init()[source]

Allows derived Components to perform a one-time init.

key()[source]
CONFIG

The key column name of the table, required when replacing records.

  • type: str

  • required: False

  • default: None

replace()[source]
CONFIG

Replace record if exists?

  • type: bool

  • required: False

  • default: False

table()[source]
CONFIG

Table for inserts.

  • type: str

  • required: False

  • default: public

class stetl.outputs.deegreeoutput.DeegreeBlobstoreOutput(configdict, section)[source]

Bases: Output

Insert features into deegree Blobstore from an etree doc.

consumes=FORMAT.etree_doc

init()[source]

Allows derived Components to perform a one-time init.

class stetl.outputs.deegreeoutput.DeegreeFSLoaderOutput(configdict, section)[source]

Bases: Output

Insert features via deegree using deegree’s FSLoader tool from an etree doc.

consumes=FORMAT.etree_doc