File Monitor¶
The Rattail File Monitor provides a generic way to watch one or more specific folders on a file system for incoming files, and perform one or more actions on new files as they appear within the watched folder(s). It is implemented as a daemon on Linux and a service on Windows.
Why?¶
While there are probably many similar applications and libraries in existence within the Python ecosystem (not to mention the computing world at large), the main features which presumably distinguish the Rattail File Monitor are:
It is more of an application than a library, but blurs this line slightly.
It is written in Python, and (typically) expects your code to be also.
It can be configured to watch for disappearance of “lock” files as opposed to appearance of files in general.
It contains a configurable retry mechanism for file actions which do not at first succeed.
It can be configured to stop processing all new files if a single file action fails.
Most generic file-watching applications may be declaratively told which folder(s) to watch, and which action(s) to take when new files arrive. Rattail is not fundamentally different in this respect. However most applications require you to define the action(s) in terms of shell executables, with optional command line parameters etc.
Most generic file-watching libraries require the developer to imperatively define which folders to watch, and the library in turn will inform the developer’s code (via some sort of “event”) when new files appear; the developer’s code must then respond to the event by invoking some explicit action(s) on the new file.
The Rattail File Monitor blends these two approaches somewhat by allowing declarative definition of both the watch and action aspects, but allowing (currently, requiring) the action(s) to be defined in terms of Python callables. In spirit it is more of an application than a library, although of course there is nothing preventing a developer from consuming its logic as a library. The goal, however, is to provide an application which frees the developer from needing to write any watch/action “glue” code and yet allows him to write simple Python code for any custom action(s) needed.
This goal in particular, as well as others listed above, are achieved by way of a flexible configuration syntax.
Note
The ultimate goal is to alleviate the need for a developer to write custom action logic at all (i.e. configuring the File Monitor to invoke pre-existing action logic instead), but that problem will likely never be truly solved, given the diversity of needs encountered in the retail world. However, “common” action logic is provided wherever possible.
Configuration¶
Configuration of the File Monitor must be defined within one or more INI-style configuration files. There are certain tricks which may be employed in order to leverage multiple config files (namely config file inheritance); however in almost all cases the configuration specific to the File Monitor itself is contained within a single file, since the File Monitor generally runs in the context of a single application. The remainder of this document will “ignore” the config file inheritance idea and assume a single config file.
There are essentially two levels to the File Monitor configuration syntax. The first level requires a simple list of “profile” names. The second level consists of the profile definitions. The term “profile” here refers to a conceptual pairing of which folder(s) to watch, and which action(s) to take when new files appear within the folder(s), as well as semantics defining how the watch/action logic should behave in general. This will all be explained along the way as we explore the syntax.
Configuration Options Quick Links
The “rattail.filemon” Section¶
First of all, and just to clarify in case it isn’t obvious, all configuration
options must be defined within the [rattail.filemon]
section of the config
file, i.e.:
[rattail.filemon]
# options go here
The “monitor” Option¶
The only option which is always required is the monitor
option; however
other options will “become” required based on its value. This option defines
which profiles are actually in effect. The basic idea is similar to how the
keys
option of the [loggers]
section works within the standard library
logging
module’s Configuration file format. (This
is mentioned for the benefit of those who are familiar, but knowledge of the
logging config syntax is not necessary here.)
So, an example:
[rattail.filemon]
monitor = foo, bar
foo.dirs = /some/path
foo.actions = process
foo.action.process.func = mypackage.mymodule:myprocessingfunc
bar.dirs = /another/path
bar.actions = special
bar.action.special.func = myothermodule:myspecialfunc
baz.dirs = /some/other/path
baz.actions = wtf
baz.action.wtf.func = os:remove
Here we have two profile names listed in the monitor
option: “foo” and
“bar”. However we actually have three profiles defined: “foo”, “bar” and
“baz”. What this means is that when the File Monitor initializes, it will only
look for (and indeed, require) profile definitions for “foo” and “bar” but will
not even look at the “baz” profile.
Implied also is that all options which define the “foo” profile must be named
with a foo.
prefix (and the “bar” profile options must be named with a
bar.
prefix). Any other options present which are not prefixed with the
name of a profile given in the monitor
option will be ignored (as is the
case with the baz.*
options above).
One final point on the syntax of the monitor
option: The value must be one
or more profile “names” (or “keys” or “prefixes” or however you like to think
of them), but in the case of multiple names, arbitrary whitespace and/or commas
are both valid separators. In other words each of the following examples are
valid, and would yield the same result:
monitor = foo, bar, baz
monitor = foo,bar,baz
monitor = foo bar baz
monitor =
foo
bar
baz
monitor =
foo,
bar baz
Note in particular the last example, which uses a comma between the first two options but not the last two.
The Profile “dirs” Option¶
This option defines which folder(s) will be watched for new files, for a given profile. Its value must be one or more directory paths. As with the “monitor” option, if multiple paths are needed then they may be separated with arbitrary whitespace and/or commas. However any path which contains spaces must be quoted. So, some examples:
[rattail.filemon]
monitor = foo
# linux
foo.dirs = /some/path/to/watch
# linux, path with spaces
foo.dirs = "/some/path with spaces/to watch"
# linux, multiple paths
foo.dirs = /some/path/to/watch, "/another/path with spaces"
# linux, multiple paths
foo.dirs =
/some/path/to/watch
"/another/path with spaces"
# win32
foo.dirs = C:\some\path\to\watch
# win32, path with spaces
foo.dirs = "C:\some\path with spaces\to watch"
# win32, multiple paths
foo.dirs = C:\some\path\to\watch, "C:\another\path with spaces"
# win32, multiple paths
foo.dirs =
C:\some\path\to\watch
"C:\another\path with spaces"
For a given profile, it is typical for there to be only one path defined in the “dirs” option. If multiple paths are defined, then of course each folder will be watched, and when a new file appears in any of the watched folders then the profile’s configured action(s) will be invoked on the file. I.e. there will be no difference in behavior when files appear in one of the folders versus another. If you need different behaviors for different folders than that is a clear sign that you need to define multiple profiles.
The Profile “watch_locks” Option¶
Probably in most cases, the event which should trigger action(s) to be taken on a file is the initial appearance of the file. However there is another possibility, which is to wait instead for the disappearance of a “lock” file which is associated with the “real” file. This deserves some explanation.
First of all it is important to understand that if the process(es) which is causing the files to appear in the first place is external to your application (i.e. the files are created by another application, outside of your control), then there almost certainly will never be any “lock” files involved and so you of course cannot watch for their disappearance. If this is your situation then the “watch_locks” option is not for you and you can safely skip this section, since the default is not to watch for lock files.
The only situation in which lock files are known to be involved is one where
some Rattail-based application is intentionally using them to provide atomicity
when moving files from one location to another, etc. Therefore in almost all
cases, if you need to watch for lock files it will be because you are
creating them in the first place, via rattail.files.locking_copy()
or
some similar mechanism.
If you do need to watch for the disappearance of lock files instead of the appearance of files in general, then here is how you would configure it:
[rattail.filemon]
monitor = foo
foo.dirs = /some/path, /another/path
foo.watch_locks = true
Note that the behavior of watching for lock files will apply to all watched folders within the profile definition. Once again, if you need to watch for lock files in one folder but not another, that means you need to define multiple profiles.
The semantics of watching for new files, with and without the lock behavior, is as follows:
If not watching for locks (the default), then as soon as a file first appears, it will be added to the action queue for processing.
If watching for locks, then any new files which appear are ignored, and instead
whenever a file deletion occurs, and if the deleted file’s path ends in
.lock
, then that suffix is stripped and the resulting file path is added to
the action queue.
The Profile “process_existing” Option¶
By default, whenever the File Monitor is first started, any files which happen to exist in the watched folder(s) will be immediately added to the processing queue. However in some cases this is not desirable. For example if the defined action(s) to not actually move the file out of the folder, then all files will be re-processed whenever a restart occurs.
It is for this type of situation that the “process_existing” option exists:
[rattail.filemon]
monitor = foo
foo.dirs = /some/path
foo.process_existing = false
Technically the value of this option could be anything supported by the
rattail.config.RattailConfig.getbool()
method, but in practice it is
conventionally set to “false” or else omitted entirely (as it is enabled by
default).
Note also that this option applies at the profile level, and is not specific to any particular watched folder. If you need different behavior for different folders, you must define additional profiles for each.
Finally, it may (or may not, depending on your situation) be important to understand that if this option is enabled, then whenever the File Monitor restarts it will add all existing files to the processing queue in order of their last modification time. The idea here is to (at least attempt) to maintain the original sequence in which files arrived in the folder. See also The Profile “stop_on_error” Option for a related option.
The Profile “stop_on_error” Option¶
In some cases, correct processing of files requires that they be processed in the precise order of arrival. Most often this is not a requirement, but if it is for you, then you must consider what might happen if one file fails to process. This situation is the reason for the “stop_on_error” option.
The idea here is that if a single file fails, then all processing should stop for the entire monitor profile to which the action belonged. I.e., any new files which appear from that moment on will not have any actions invoked on them. This means that whatever caused the original failure must be addressed by you and then you must restart the File Monitor. See also the related The Profile “process_existing” Option option.
There are probably more caveats to mention, e.g. it also is assumed that you have a way to be notified when a failure occurs. Again, this is not a common need so it is assumed that those who do need it understand the implications.
An example:
[rattail.filemon]
monitor = foo
foo.dirs = /some/path
foo.actions = process
foo.action.process.func = mymodule:my_processor_function
foo.stop_on_error = true
You’ll notice that this is option applies at the profile level and not at the action level. As of this writing, the action-level granularity has not been needed, although it may be added in the future.
Technically the value of this option could be anything supported by the
rattail.config.RattailConfig.getbool()
method, but in practice it is
conventionally set to “true” or else omitted entirely (as it is disabled by
default).
The Profile “actions” Option¶
This option is somewhat like the main “monitor” option, in that it defines one or more action “names” (or “keys” or “prefixes”), each of which represents a particular action to invoke on new files (and the details of which will require further definition, to be provided by additional options). The sequence of these names matters (assuming there is more than one), because it will determine the order in which the actions should be invoked. The action name(s) need not bear any resemblance to the name of the actual function (etc.) which is to be invoked. Some examples:
[rattail.filemon]
monitor = foo
foo.dirs = /some/path
foo.actions = process
foo.actions = copy, delete
foo.actions =
copy_to_server_A
copy_to_server_B
process_locally
backup
Again, and despite the above examples which are not complete, specifying an action name within a profile’s “actions” option means you must then define the action further. The remainder of this document explains how.
The Profile Action “func” Option¶
In almost all cases, an action which is to be invoked on new files will be a Python function.
Note
As of this writing, there is only one other possibility, which is for the action to be a Python callable class. See The Profile Action “class” Option for more information.
For each action named in The Profile “actions” Option, the invocation of which will be to call a Python function, a “func” option must be defined. The value of this option must be a “spec” which indicates the function name and the Python module in which it is contained. An example:
[rattail.filemon]
monitor = foo
foo.dirs = /some/path
foo.actions = process
foo.action.process.func = mypackage.mymodule:my_processor_function
Note
While the “actions” option defines one or more action names, each named action is further defined with options which contain an “action” (no “s”) prefix.
Note
There must be a colon (“:”) separating the module path from the function name.
In this example we have defined an action named “process” and defined a
function for the action, which is the my_processor_function
function from
the mypackage.mymodule
module. At runtime, invocation will be equivalent
to the following Python code (where file_path
is the absolute path of the
new file discovered by the monitor):
from mypackage.mymodule import my_processor_function
my_processor_function(file_path)
It is possible to specify additional positional and keyword arguments to the function when calling it; see The Profile Action “args” Option and The Profile Action “kwarg” Option(s) for more information.
The Profile Action “class” Option¶
In some cases, an action which is to be invoked on new files will be a Python class. More precisely, the class will be instantiated, and the instance will be called at invocation time.
Note
As of this writing, there is only one other possibility, which is for the action to be a Python callable function. See The Profile Action “func” Option for more information.
For each action named in The Profile “actions” Option, the invocation of which will be to call a Python class, a “class” option must be defined. The value of this option must be a “spec” which indicates the class name and the Python module in which it is contained. An example:
[rattail.filemon]
monitor = foo
foo.dirs = /some/path
foo.actions = process
foo.action.process.class = mypackage.mymodule:MyProcessorClass
Note
While the “actions” option defines one or more action names, each named action is further defined with options which contain an “action” (no “s”) prefix.
Note
There must be a colon (“:”) separating the module path from the class name.
In this example we have defined an action named “process” and defined a class
for the action, which is the MyProcessorClass
function from the
mypackage.mymodule
module. At runtime, invocation will be (sort of)
equivalent to the following Python code (where file_path
is the absolute
path of the new file discovered by the monitor):
from mypackage.mymodule import MyProcessorClass
instance = MyProcessorClass()
instance(file_path)
It is possible to specify additional positional and keyword arguments to the class instance when calling it; see The Profile Action “args” Option and The Profile Action “kwarg” Option(s) for more information.
The Profile Action “args” Option¶
Regardless of whether you have defined the action callable as a function or a class, you may specify extra positional arguments to be passed to the callable. This is accomplished by an “args” option which is prefixed by the action name.
Its value is interpreted as a “list” of one or more values, each separated by whitespace and/or comma. See The Profile “dirs” Option for some more examples of how the parsing of this works; the main point is that if you need to specify a “single” value which contains spaces, it must be quoted.
An example:
[rattail.filemon]
monitor = foo
foo.dirs = /some/path
foo.actions = process
foo.action.process.func = mymodule:my_processor_function
foo.action.process.args = /some/other/path, 42, True
# or, using another syntax:
foo.action.process.args =
/some/other/path
42
True
The above will result in the following logic at runtime:
from mymodule import my_processor_function
my_processor_function(file_path, u'/some/other/path', u'42', u'True')
Note
In all cases these extra arguments will be passed to the callable as unicode strings. The File Monitor will make no effort to coerce them to any other type; this burden rests on the callable if it is needed.
The Profile Action “kwarg” Option(s)¶
Regardless of whether you have defined the action callable as a function or a class, you may specify extra keyword arguments to be passed to the callable. This is accomplished by one or more “kwarg” options, each of which is prefixed by the action name, and the word “kwarg”, and ending in the keyword itself.
Its value is read as a single unicode string with no interpretation, unlike the “args” option described above.
An example:
[rattail.filemon]
monitor = foo
foo.dirs = /some/path
foo.actions = process
foo.action.process.func = mymodule:my_processor_function
foo.action.process.kwarg.something = /some/other/path
foo.action.process.kwarg.another = 42
The above will result in the following logic at runtime:
from mymodule import my_processor_function
my_processor_function(file_path, something=u'/some/other/path', another=u'42')
Note
As with the “args” option above, no type coercion will be done on keyword argument values.
Finally, note that the “args” and “kwarg” option(s) may be mixed:
[rattail.filemon]
monitor = foo
foo.dirs = /some/path
foo.actions = process
foo.action.process.func = mymodule:my_processor_function
foo.action.process.args = /some/other/path
foo.action.process.kwarg.something = 42
The above will result in this logic:
from mymodule import my_processor_function
my_processor_function(file_path, u'/some/other/path', something=u'42')
The Profile Action “retry” Options¶
By default, all actions are attempted only once. Should the action fail, any subsequent actions (if there would be any) for that particular file will be skipped.
Note
It is possible to forego all processing for any other files as well, if this is desired; see The Profile “stop_on_error” Option for more information.
If any particular action should be considered “retryable” then this may be declared via the “retry_attempts” and “retry_delay” options. These should be self-explanatory; the “retry_attempts” defines how many attempts are allowed for a given action for a given file, and “retry_delay” defines how long to wait (in seconds) between the attempts.
An example:
[rattail.filemon]
monitor = foo
foo.dirs = /some/path
foo.actions = process
foo.action.process.func = mymodule:my_processor_function
foo.action.retry_attempts = 3
foo.action.retry_delay = 5
In the above example, my_processor_function()
will be called once in all
cases; if the first call fails, then the File Monitor will wait 5 seconds and
then call it again. If the second call fails, another pause of 5 seconds will
happen before calling the third time. If the call fails again (i.e. for the
third time) then the File Monitor will give up on the file. At this point the
logic is no different than if a non-retryable action had failed the first time;
i.e. any subsequent actions will be skipped for the file, etc.
The determination of whether an action “fails” is simply based on the occurrence of an unhandled exception. Since there are different types of exceptions, the retry logic tries to play it “safe” and assume that all “retryable” failures should correspond to the same exception type. I.e. if the first call fails with an exception of one type, then while attempting the second call, a different exception type is raised, the File Monitor will consider it an utter failure and not retry again.
Note
The default “retry_attempts” value for all actions is one (1). The default “retry_delay” value for all actions is zero (0).