Monitor file changes using Python
In this tutorial, we’ll look at how we can monitor file changes using Python. We’re going to use Watchdog which is a Python library that offers APIs and shell utilities to monitor file changes using Python. We’ll take a look at how it works, what it can be used for, and provide some insight into how you can use it to monitor file system events in your own applications.
What is Python Watchdog?
The Watchdog is an open-source application that allows you to monitor file changes using Python. Watchdog could be used for a number of common tasks, such as watching for changes in file systems, watching for file systems events, and observing file systems directory. Watchdog operates by registering file system event handlers that fire whenever the corresponding event occurs. Handlers can be registered in response to events on a single file, or on all events on a directory. Users can also provide multiple event handlers and can execute shell scripts when an event is triggered.
Watchdog is a Python library that provides API and shell utilities to monitor file changes using Python. Events include changes to files, changes to the list of files within a directory, and changes to file meta-data such as access control lists. Changes to file metadata are monitored as well including file ownership, filegroup, file flags, and file access control list.
Prerequisites
In order to ensure files are present on your computer and performing as they should, it may be important for you to set up a framework for keeping track of their whereabouts and consistency over time. To do this, you will need the following:
Watchdog itself can simply be installed with the following command:
Or it can be Installed from the code repository like so:
Implementation
The main building blocks of watchdog are based on two parts the watcher or observer and the event handler.
Watcher
The Observer class is at the heart of this recipe. It will monitor file changes using Python and the directory passed to it, watching all files as well as any subdirectories that are created. The Watcher object recognizes changes to files in the folder that it’s monitoring and fires an event for your application whenever such a change occurs. If a change occurs anywhere within the monitored directory (or sub-directory), the call method will be called with a hash containing details about the change.
Handler
The Handler object applies your code to determine what to do with events received from the Watcher. A handler is written by you and you can decide anything you want to do with it such as just viewing it live store it somewhere or automating some tasks to it. It could also be used on websites for security purposes.
- Create an instance of the watchdog.observers.Observer thread class. The Observer design pattern works well for creating a thread that acts on events.
- Implement a subclass of watchdog.events.FileSystemEventHandler (or just add watchdog.events.LoggingEventHandler)
- Set paths to be monitored with the observer instance attaching the event handler.
- Start the observer thread and monitor file changes using Python
Note: By default, an watchdog.observers .Observer instance will not monitor sub-directories, only by passing ‘recursive=True’ in the call, it would able to monitor sub-directories
Code
Setting up Watchdog is pretty simple. You just have to first register the file system event handler that you want to use, then wait for events to occur, and finally, act accordingly. For example, if you’d like to be notified every time a directory or file changes its status within your application’s operating system (e.g., becomes open or closed, added or removed), simply register the observer property handler with the directory containing that file or all files in a given directory. The handler will also be triggered if any of the core file attributes – including modified time or directory are updated. You can see the example below:
1 else '.' event_handler = LoggingEventHandler() observer = Observer() observer.schedule(event_handler, path, recursive=True) #Scheduling monitoring of a path with the observer instance and event handler. There is 'recursive=True' because only with it enabled, watchdog.observers.Observer can monitor sub-directories observer.start() #for starting the observer thread try: while True: time.sleep(1) except KeyboardInterrupt: observer.stop() observer.join()
Here you can see that the path is to the directory where the file is currently located, but you can set it however you want in the following manner:
Also, we have to put the r in front of the path to make it a raw string and the program knows exactly where to look for it. After doing that, run the file name in the terminal as the command to start the application.
Results
If the application worked successfully you will see no response but no errors either. You should be seeing an empty terminal right now similar to this:
Now try and run some files where you set the path to be or if you followed the steps exactly as above then it would be where you created this file. Run some files or add some new ones there and should see some results printing out in the terminal, something similar to this:
You can see the time and the message here as we specified above in the code itself, it is currently outputting the location of the files and its name as the message.
Conclusion
That’s it! Here we have created a simple application to monitor the files changes in our system using the python watchdog library. It is a very versatile library that can not only be used for monitoring file changes but also to automate them. It is widely used with corn jobs on supported systems and can also be used on websites for many purposes including security! And these are just a few of them there are tons of features it offers that you can learn from reading from their incredible documentation or get to know their code better from their GitHub.
Let’s make a quick recap to summarize everything:
First, we installed all the prerequisites required for running the watchdog.
Next, we moved on to explore in-depth the available features and tried out a basic implementation of it. We created a subclass of FileSystemEventHandler and passed it as an input parameter to the Observer object. From there, the input FileSystemEventHandler will trigger events that will invoke corresponding functions when there are modifications to the files or folders in the directory.
Finally, we made some tests via the creation, modification, and deletion of some files.
Мониторинг за изменениями файловой системы
В поисках готового велосипеда для решения задачи мониторинга за изменениями в ФС с поддержкой linux+freebsd наткнулся на приятную python либу watchdog (github, packages.python.org). Которая помимо интересных мне ОС поддерживает также MacOS (есть своя специфика) и Windows.
Тем, кому данный вопрос интересен и кого не отпугнет индийское происхождение автора, прошу .
Установка
Можно взять готовую версию из PIP:
$ pip install watchdog
Сам PIP ставится как пакет python-pip, порт devel/py-pip, etc.
Либо собрать из исходников через setup.py.
Достаточно подробно все расписано в оригинальном руководстве. Правда там описание версии 0.5.4, а сейчас актуальна 0.6.0. Однако, вся разница в правке копирайтов и замене отступа в 4 пробела на отступ в 2. «Google code style» 🙂
Вообще, там довольно много особенностей сборки по версиям самого python так и по целевой платформе. Они все описаны по ссылке выше, но если будет нужно, допишу в статью вкратце на русском.
Кроме того, собрать модуль можно на несовместимой ОС, но тогда в дело вступится fallback-реализация, делающая «слепки» структуры ФС с последующими сравнениями. Возможно, так кто-то и делал у себя при решении подобной задачи 🙂
Сам же я пробовал собрать под ubuntu 11.4 и freebsd-8.2 RELEASE, каких-либо проблем при сборке и работе не возникло.
Базовый пример
Предположим, что нас интересуют изменения по некоему пути /path/to/smth, связанные с созданием, удалением и переименованием файлов и директорий.
from watchdog.observers import Observer from watchdog.events import FileSystemEventHandler
Класс Observer выбирается в /observers/__init__.py исходя из возможностей вашей ОС, так что нет необходимости самостоятельно решать, что же выбрать.
Класс FileSystemEventHandler является базовым классом обработчика событий изменения. Он мало что умеет, но мы научим его потомка:
class Handler(FileSystemEventHandler): def on_created(self, event): print event def on_deleted(self, event): print event def on_moved(self, event): print event
Полный список методов можно увидеть в самом FileSystemEventHandler.dispatch: on_modified, on_moved, on_created, on_deleted.
observer = Observer() observer.schedule(Handler(), path='/path/to/smth', recursive=True) observer.start()
Observer является относительно далеким потомком threading.Thread, соотвественно после вызова start() мы получаем фоновый поток, следящий за изменениями. Так что если скрипт сразу завершится, то ничего толкового мы не получим. Реалиация ожидания зависит в первую очередь от применения модуля в реальном проекте, сейчас же можно просто сделать костыль:
try: while True: time.sleep(0.1) except KeyboardInterrupt: observer.stop() observer.join()
Ждем событий изменений ФС до прихода Ctrl+C (SIGINT), после чего говорим нашему потоку завершиться и ждем, пока он это выполнит.
Запускаем скрипт, идем по нашему пути и:
# mkdir foo # touch bar # mv bar baz # cd foo/ # mkdir foz # mv ../baz ./quz # cp ./quz ../hw # cd .. # rm -r ./foo # rm -f ./*
В методы нашего класса Handler в поле event приходят потомки FileSystemEvent, перечисленные в watchdog/events.py.
У всех есть свойства src_path, is_directory, event_type («created», «deleted», и т.п.). Для события moved добавляется свойство dest_path.
Ну если вы больше ничего не хотите… А разве ещё что-нибудь есть?
- * любые символы
- ? любой единичный символ
- [seq] любой единичный символ из указанных
- [!seq] любой единичный символ НЕ из указанных
class Handler(PatternMatchingEventHandler): pass event_handler = Handler( patterns = ['*.py*'], ignore_patterns = ['cache/*'], ignore_directories = True, case_sensitive = False ) observer = Observer() observer.schedule(event_handler, path='/home/LOGS/', recursive=True)
RegexMatchingEventHandler делает тоже самое, но с явным указанием regexp-выражений в конструкторе:
class Handler(RegexMatchingEventHandler): pass event_handler = Handler( regexes = ['\.py.?'], ignore_regexes = ['cache/.*'], ignore_directories = True, case_sensitive = False )
PatternMatchingEventHandler внутри себя в итоге транслирует шаблоны в регулярки, так что должен работать медленнее из-за наличия такого оверхеда.
Наконец, LoggingEventHandler выводит все в лог через logging.info().
— Вот и все. Может кому пригодится.
P.S.
При слежении за директорией, в которой (и в ее дочерних) содержатся папки/файлы не с ascii именованием, возникнет исключение exceptions.UnicodeEncodeError в глубинах watchdog’а. В Linux (inotify) он возникает в watchdog.observers.inotify.Inotify._add_watch.
Причина — чтение содержимого в ascii кодировке.
Для исправления ситуации можно пропатчить метод:
from watchdog.observers.inotify import Inotify _save = Inotify._add_watch Inotify._add_watch = lambda self, path, mask: _save(self, path.encode('utf-8'), mask)
Вот пример исходной строки, и ее repr() до и после обработки encode():
/home/atercattus/.wine/drive_c/users/Public/Рабочий стол u'/home/atercattus/.wine/drive_c/users/Public/\u0420\u0430\u0431\u043e\u0447\u0438\u0439 \u0441\u0442\u043e\u043b' '/home/atercattus/.wine/drive_c/users/Public/\xd0\xa0\xd0\xb0\xd0\xb1\xd0\xbe\xd1\x87\xd0\xb8\xd0\xb9 \xd1\x81\xd1\x82\xd0\xbe\xd0\xbb'