`wuttasync.importing.base`¶

Data Importer base class

class wuttasync.importing.base.FromFile(config, **kwargs)[source]¶

Base class for importer/exporter using input file as data source.

Depending on the subclass, it may be able to “guess” (at least partially) the path to the input file. If not, and/or to avoid ambiguity, the caller must specify the file path.

In most cases caller may specify any of these via kwarg to the class constructor, or e.g. process_data():

input_file_path
input_file_dir
input_file_name

The subclass itself can also specify via override of these methods:

get_input_file_path()
get_input_file_dir()
get_input_file_name()

And of course subclass must override these too:

open_input_file()
close_input_file()
(and see also input_file)

input_file_path¶: Path to the input file.

input_file_dir¶: Path to folder containing input file(s).

input_file_name¶: Name of the input file, sans folder path.

input_file¶: Handle to the open input file, if applicable. May be set by open_input_file() for later reference within close_input_file().

close_input_file()[source]¶

Close the input file for source data.

Subclass must override to specify how this happens; default logic blindly calls the close() method on whatever input_file happens to point to.

See also allow_delete.

delete_target_object(obj)[source]¶

Delete the given raw object from the target side, and return true if successful.

This is called from do_delete().

Default logic for this method just returns false; subclass should override if needed.

Returns:: Should return True if deletion succeeds, or False if deletion failed or was skipped.

do_create_update(all_source_data, progress=None)[source]¶

Import/export the given normalized source data; create and/or update target records as needed.

Parameters:

all_source_data – Sequence of all normalized source data, e.g. as obtained from normalize_source_data().
progress – Optional progress indicator factory.

Returns:

A 2-tuple of (created, updated) as follows:

created - list of records created on the target
updated - list of records updated on the target

This loops through all source data records, and for each will try to find a matching target record. If a match is found it also checks if any field values differ between them. So, calls to these methods may also happen from here:

get_record_key()
get_target_object()
create_target_object()
update_target_object()

do_delete(source_keys, changes=None, progress=None)[source]¶

Delete records from the target side as needed, per the given source data.

This will call get_deletable_keys() to discover which keys existing on the target side could theoretically allow being deleted.

From that set it will remove all the given source keys - since such keys still exist on the source, they should not be deleted from target.

If any “deletable” keys remain, their corresponding objects are removed from target via delete_target_object().

Parameters:

source_keys – A set of keys for all source records. Essentially this is just the list of keys for which target records should not be deleted - since they still exist in the data source.
changes – Number of changes which have already been made on the target side. Used to enforce max allowed changes, if applicable.
progress – Optional progress indicator factory.

Returns:

List of target records which were deleted.

property dry_run¶: Convenience property which returns the value of wuttasync.importing.handlers.ImportHandler.dry_run from the parent import/export handler.

get_deletable_keys(progress=None)[source]¶

Return a set of record keys from the target side, which are potentially eligible for deletion.

Inclusion in this set does not imply a given record/key should be deleted, only that app logic (e.g. business rules) does not prevent it.

Default logic here will look in the cached_target and then call can_delete_object() for each record in the cache. If that call returns true for a given key, it is included in the result.

Returns:: The set of target record keys eligible for deletion.

get_fields()[source]¶

This should return the “effective” list of fields which are to be used for the import/export.

See also fields which is normally what this returns.

All fields in this list should also be found in the output for get_supported_fields().

See also get_keys() and get_simple_fields().

Returns:: List of “effective” field names.

get_keys()[source]¶

Must return the key field(s) for use with import/export.

All fields in this list should also be found in the output for get_fields().

Returns:: List of “key” field names.

get_model_title()[source]¶: Returns the display title for the target data model.

get_record_key(data)[source]¶

Returns the canonical key value for the given normalized data record.

Parameters:: data – Normalized data record (dict).
Returns:: A tuple of field values, corresponding to the import/export key fields.

Note that this calls get_keys() to determine the import/export key fields.

So if an importer has key = 'id' then get_keys() would return ('id',) and this method would return just the id value e.g. (42,) for the given data record.

The return value is always a tuple for consistency and to allow for composite key fields.

get_simple_fields()[source]¶

This should return a (possibly empty) list of “simple” fields for the import/export. A “simple” field is one where the value is a simple scalar, so e.g. can use getattr(obj, field) to read and setattr(obj, field, value) to write.

Returns:: Possibly empty list of “simple” field names.

get_source_objects()[source]¶

This method (if applicable) should return a sequence of “raw” data objects (i.e. non-normalized records) from the source.

This method is typically called from normalize_source_data() which then also handles the normalization.

get_supported_fields()[source]¶

This should return the full list of fields which are available for the import/export.

Note that this field list applies first and foremost to the target side, i.e. if the target (table etc.) has no “foo” field defined then it should not be listed here.

But it also applies to the source side, e.g. if target does define a “foo” field but source does not, then it again should not be listed here.

Returns:: List of all “supported” field names.

get_target_cache(source_data=None, progress=None)[source]¶

Fetch all (existing) raw objects and normalized data from the target side, and return a cache object with all of that.

This method will call get_target_objects() first, and pass along the source_data param if specified. From there it will call normalize_target_object() and get_record_key() for each.

Parameters:

source_data – Sequence of normalized source data for the import/export job, if known.
progress – Optional progress indicator factory.

Returns:

Dict whose keys are record keys (so one entry for every normalized target record) and the values are a nested dict with raw object and normalized record.

A minimal but complete example of what this return value looks like:

{
    (1,): {
        'object': <some_object_1>,
        'data': {'id': 1, 'description': 'foo'},
    }
    (2,): {
        'object': <some_object_2>,
        'data': {'id': 2, 'description': 'bar'},
    }
}

get_target_object(key)[source]¶

Should return the object from (existing) target data set which corresponds to the given record key, if found.

Note that the default logic is able to find/return the object from cached_target if applicable. But it is not able to do a one-off lookup e.g. in the target DB. If you need the latter then you should override this method.

Returns:: Raw target data object, or None.

get_target_objects(source_data=None, progress=None)[source]¶

Fetch all existing raw objects from the data target. Or at least, enough of them to satisfy matching on the given source data (if applicable).

Parameters:

source_data – Sequence of normalized source data for the import/export job, if known.
progress – Optional progress indicator factory.

Returns:

Corresponding sequence of raw objects from the target side.

Note that the source data is provided only for cases where that might be useful; it often is not.

But for instance if the source data contains say an ID field and the min/max values present in the data set are 1 thru 100, but the target side has millions of records, you might only fetch ID <= 100 from target side as an optimization.

get_unique_data(source_data)[source]¶

Return a copy of the given source data, with any duplicate records removed.

This looks for duplicates based on the effective key fields, cf. get_keys(). The first record found with a given key is kept; subsequent records with that key are discarded.

This is called from process_data() and is done largely for sanity’s sake, to avoid indeterminate behavior when source data contains duplicates. For instance:

Problem #1: If source contains 2 records with key ‘X’ it makes no sense to create both records on the target side.

Problem #2: if the 2 source records have different data (apart from their key) then which should target reflect?

So the main point of this method is to discard the duplicates to avoid problem #1, but do it in a deterministic way so at least the “choice” of which record is kept will not vary across runs; hence “pseudo-resolve” problem #2.

Parameters:

source_data – Sequence of normalized source data.

Returns:

A 2-tuple of (source_data, unique_keys) where:

source_data is the final list of source data
unique_keys is a set of the source record keys

make_empty_object(key)[source]¶

Return a new empty target object for the given key.

This method is called from create_target_object(). It should only populate the object’s key, and leave the rest of the fields to update_target_object().

Default logic will call make_object() to get the bare instance, then populate just the fields from get_keys().

make_object()[source]¶

Make a bare target object instance.

This method need not populate the object in any way. See also make_empty_object().

Default logic will make a new instance of model_class.

normalize_source_data(source_objects=None, progress=None)[source]¶

This method must return the full list of normalized data records from the source.

Default logic here will call get_source_objects() and then for each object normalize_source_object_all() is called.

Parameters:

source_objects – Optional sequence of raw objects from the data source. If not specified, it is obtained from get_source_objects().
progress – Optional progress indicator factory.

Returns:

List of normalized source data records.

normalize_source_object(obj)[source]¶

This should return a single “normalized” data record for the given source object.

Subclass will usually need to override this, to “convert” source data into the shared format required for import/export. The default logic merely returns the object as-is!

Note that if this method returns None then the object is effectively skipped, treated like it does not exist on the source side.

Parameters:: obj – Raw object from data source.
Returns:: Dict of normalized data fields, or None.

normalize_source_object_all(obj)[source]¶

This method should “iterate” over the given object and return a list of corresponding normalized data records.

In most cases, the object is “singular” and it doesn’t really make sense to return more than one data record for it. But this method is here for subclass to override in those rare cases where you do need to “expand” the object into multiple source data records.

Default logic for this method simply calls normalize_source_object() for the given object, and returns a list with just that one record.

Parameters:: obj – Raw object from data source.
Returns:: List of normalized data records corresponding to the source object.

normalize_target_object(obj)[source]¶

This should return a “normalized” data record for the given raw object from the target side.

Subclass will often need to override this, to “convert” target object into the shared format required for import/export. The default logic is only able to handle “simple” fields; cf. get_simple_fields().

It’s possible to optimize this somewhat, by checking get_fields() and then normalization may be skipped for any fields which aren’t “effective” for the current job.

Note that if this method returns None then the object is ignored, treated like it does not exist on the target side.

Parameters:: obj – Raw object from data target.
Returns:: Dict of normalized data fields, or None.

property orientation¶: Convenience property which returns the value of wuttasync.importing.handlers.ImportHandler.orientation from the parent import/export handler.

process_data(source_data=None, progress=None)[source]¶

Perform the data import/export operations on the target.

This is the core feature logic and may create, update and/or delete records on the target side, depending on (subclass) implementation. It is invoked directly by the parent handler.

Note that subclass generally should not override this method, but instead some of the others.

This first calls setup() to prepare things as needed.

If no source data is specified, it calls normalize_source_data() to get that. Regardless, it also calls get_unique_data() to discard any duplicates.

If caches_target is set, it calls get_target_cache() and assigns result to cached_target.

Then depending on values for create, update and delete it may call: