rattail.importing.importers

Data Importers

class rattail.importing.importers.Importer(config=None, key=None, direction='import', fields=None, exclude_fields=None, fuzzy_fields=None, fuzz_factor=None, **kwargs)[source]

Base class for all data importers.

direction

Should be a string, either 'import' or 'export'. This value is used to improve verbiage for logging and other output, for a better overall user experience. It may also be used by importer logic, where the direction would otherwise be ambiguous.

Note that the handler is responsible for assigning this value; the importer should not define it. See also rattail.importing.handlers.ImportHandler.direction.

Attr collect_changes_for_processing:

If true (the default) then any changes occurring as a result of the import will be collected for processing by the handler, once the import has completed. (I.e. it might send out a warning email with the changes.) If the changes are not “important” per se, and they involve large data sets, then you may want to turn off this flag to avoid the overhead of collecting the changes. In practice this is usually done if memory consumption is too great, as long as you don’t actually need to track the changes. Also note that the flag usually may be turned off via command line kwarg (--no-collect-changes).

cache_local_data(host_data=None)[source]

Cache all raw objects and normalized data from the local system.

cache_local_message()[source]

Must return a message to be used for progress when fetching “local” data.

cache_model(model, **kwargs)[source]

Convenience method which invokes rattail.db.cache.cache_model() with the given model and keyword arguments. It will provide the session and progress parameters by default, setting them to the importer’s attributes of the same names.

can_delete_object(obj, data)[source]

Should return a boolean indiciating whether or not the given object “can” be deleted. Default is to return True in all cases.

If you return False then the importer will not perform any delete action on the object.

create_object(key, host_data)[source]

Create and return a new local object for the given key, fully populated from the given host data. This may return None if no object is created.

data_diffs(local_data, host_data)[source]

Find all (relevant) fields which differ between the host and local data values for a given record.

datasync_setup()[source]

Perform any setup necessary, in the context of a datasync job.

delete_object(obj)[source]

Delete the given object from the local system (or not), and return a boolean indicating whether deletion was successful. What exactly this entails may vary; default implementation does nothing at all.

exclude_fields(*args)[source]

Remove the given fields from the supported field list for the importer. May be used at runtime to customize behavior.

fields_active(fields)[source]

Convenience method to check if any of the given fields are currently “active” for the importer. Returns True or False.

flush_create_update()[source]

Perform any steps necessary to “flush” the create/update changes which have occurred thus far in the import.

flush_create_update_final()[source]

Perform any final steps to “flush” the created/updated data here.

flush_delete()[source]

Perform any steps necessary to “flush” the create/update changes which have occurred thus far in the import.

get_cache_key(obj, normal)[source]

Get the primary cache key for a given object and normalized data.

Note that this method’s signature is designed for use with the rattail.db.cache.cache_model() function, and as such the normal parameter is understood to be a dict with a 'data' key, value for which is the normalized data dict for the raw object.

get_deletion_keys()[source]

Return a set of keys from the local data set, which are eligible for deletion. By default this will be all keys from the local cached data set, or an empty set if local data isn’t cached.

get_host_objects()[source]

Return the “raw” (as-is, not normalized) host objects which are to be imported. This may return any sequence-like object, which has a len() value and responds to iteration etc. The objects contained within it may be of any type, no assumptions are made there. (That is the job of the normalize_host_data() method.)

get_key(data)[source]

Return the key value for the given data dict.

get_local_object(key)[source]

Must return the local object corresponding to the given key, or None. Default behavior here will be to check the cache if one is in effect, otherwise return the value from get_single_local_object().

get_local_objects(host_data=None)[source]

Fetch all raw objects from the local system.

get_local_system_title()[source]

Retrieve the system title for the local/target side.

get_single_host_object(key)[source]

Must return the host object corresponding to the given key, or None. This method should not consult the cache; it is meant to be called within datasync or other “one-off” scenarios.

get_single_local_object(key)[source]

Must return the local object corresponding to the given key, or None. This method should not consult the cache; that is handled within the get_local_object() method.

import_data(host_data=None, now=None, **kwargs)[source]

Import some data! This is the core body of logic for that, regardless of where data is coming from or where it’s headed. Note that this method handles deletions as well as adds/updates.

import_single_object(host_object, **kwargs)[source]

Import a single object from host. This is meant primarily for use with scripts etc. and is not part of a “normal” (full) import run.

include_fields(*args)[source]

Add the given fields to the supported field list for the importer. May be used at runtime to customize behavior.

make_object()[source]

Make a new/empty local object from scratch.

new_object(key)[source]

Return a new local object to correspond to the given key. Note that this method should only populate the object’s key, and leave the rest of the fields to update_object().

normalize_cache_object(obj, data=None)[source]

Normalizer for cached local data. This returns a simple dict with 'object' and 'data' keys; values are the raw object and its normalized data dict, respectively.

normalize_host_data(host_objects=None)[source]

Return a normalized version of the full set of host data. Note that this calls get_host_objects() to obtain the initial raw objects, and then normalizes each object. The normalization process may filter out some records from the set, in which case the return value will be smaller than the original data set.

normalize_host_object(obj)[source]

Normalize a raw host object into a data dict, or return None if the object should be ignored for the importer’s purposes.

normalize_local_object(obj)[source]

Normalize a local (raw) object into a data dict.

prioritize_2(data, field, field2=None)[source]

Prioritize the data values for the pair of fields implied by the given fieldname. I.e., if only one non-empty value is present, make sure it’s in the first slot.

setup()[source]

Perform any setup necessary, e.g. cache lookups for existing data.

teardown()[source]

Perform any cleanup after import, if necessary.

update_object(obj, host_data, local_data=None, all_fields=False)[source]

Update the local data object with the given host data, and return the object.

class rattail.importing.importers.FromQuery(config=None, key=None, direction='import', fields=None, exclude_fields=None, fuzzy_fields=None, fuzz_factor=None, **kwargs)[source]

Generic base class for importers whose raw external data source is a SQLAlchemy (or Django, or possibly other?) query.

get_host_objects()[source]

Returns (raw) query results as a sequence.

query()[source]

Subclasses must override this, and return the primary query which will define the data set.

class rattail.importing.importers.BulkImporter(config=None, key=None, direction='import', fields=None, exclude_fields=None, fuzzy_fields=None, fuzz_factor=None, **kwargs)[source]

Base class for bulk data importers.

import_data(host_data=None, now=None, **kwargs)[source]

Import some data! This is the core body of logic for that, regardless of where data is coming from or where it’s headed. Note that this method handles deletions as well as adds/updates.