Siegfried Gevatter: A gentle introduction to Zeitgeist s Python API
In this post I ll make a quick introduction by example on how to use Zeitgeist s Python API for good and profit.
If you re interested in using Zeitgeist from C instead, see the libzeitgeist examples; to use it with C++/Qt, Trever s a web browser in 4 steps may be of interest.
First things first
In case you re not familiar with Zeitgeist, it may prove helpful to first read Mikkel s introduction to Zeitgeist post.
If that s too much to read, just should know that Zeitgeist is an event log. Like the history in your browser, it keeps track of what websites you open at which point in time. It also keeps track of when you close them, and of what browser you used, since it s a system-wide service. Furthermore, it does the same for files, conversations, e-mails, and anything else you want to insert into it.
So Zeitgeist is a database of events, and an event can be pretty much anything. But what does it look like? It s main attributes are the following:
This may need some explaining. We start by importing the Zeitgeist client and some associated data structures (such as Event and Interpretation), and create an instance of the client. The ZeitgeistClient class is a wrapper around Zeitgeist s D-Bus API which not only makes it much nicer to use, but also makes it easy to install monitors (as we will see later) and provides convenient functionality such as automatic reconnection if the connection to the engine is lost.
To query for the most recent song, we just need to create a template with the restrictions we want to impose and submit it to Zeitgeist using the find_events_for_template call. If you haven t read Mikkel s post yet, please do so, as it introduces the structure of events and subjects (in short: an event has a timestamp, some other properties, and one or more subjects representing the resources -files, websites, people - involved in the event).
The Python API is inherently asynchronous (if for some reason you need a synchronous API, you may still use the lower level ZeitgeistDBusInterface class), so we need to define a callback function to handle the results we receive from the Zeitgeist engine.
Finally, we need to create a main loop so the asynchronous functions can run.
Now this was a pretty simple example. Let s make it more interesting. One song isn t much, so let s get the 5 most recent songs. Also, now we want both songs and videos. The first part is pretty easy, we just need to change the num_events parameter. For the second extension, we have to change the event template. In fact, now we need two different event templates and the find_events_for_templates function, which takes an arbitrary number of event templates and ORs them. The result is as follows:
This will work, but unless you re lucky you re likely to get some duplicate line. Why is this? Well, other than that you may have used the same file twice, don t forget that what you are requesting are actually events. If you ve started playing a given song, you probably also stopped playing it, so that s actually two of them (an AccessEvent and a LeaveEvent). Since this isn t what we want, we ll change the query a bit:
By requesting the most recent subjects, vs. the most recent events, we can filter out events with duplicate URI. See the ResultType documentation for other modes you can use. Note particularly the MostPopularSubjects result type.
I also used the chance to introduce the storage_state parameter. This one will filter out events for files Zeitgeist knows aren t available (this mostly means online resources won t be shown if you don t have a network connnection; there s also support for handling external storage media, but because of problems with GIO this is currently disabled).
Last but not least, the find_events_for_* methods also accept a timerange parameter. It defaults to TimeRange.until_now(), but you may change it to TimeRange.always() (if for some reason you re working with events in the future) or to any other time range of your choice. Here it s important to note that Zeitgeist s timestamps use millisecond precision.
For more advanced queries, you can use more complex combinations of events and subject templates. The rule to keep in mind here is that events are OR d and subjects are ANDed. Additionally, some field (actor, origin, mimetype, uri and current_uri) may be prefixed with an exclamation mark ( ! ) for NOT, or you may append an asterisc ( * ) to them for prefix search. You can even combine the two operators together. Here s an example of a template you could build:
In my case, this template would fetch a list of the source code files I modified most recently but excluding those related to the Zeitgeist project.
Working with big sets of data
In case you re trying to do something crazy, you may end up with a Zeitgeist query complaining that it exceeded the memory limit. You re not supposed to do that. Instead, we provide some methods for working with large collections of events.
And there you have the source code files you worked with during the last 3 months, ordered from most to least popular (popularity is measured counting the number of events; for more precision, maybe you could limit the results to events with interpretation AccessEvent).
Why do we provide this mechanism instead of querying with a simple offset? Well, this avoids problems when the log changes (events are inserted or deleted). Have you ever been exploring the latest posts in some website, and as you change to the next page some of the results from the previous page show up again (because new posts have been added in the meantime)? With Zeitgeist this won t happen.
Receiving information in real time
At this point you re an expert at requesting all sorts of data from Zeitgeist, but now you want to show a list of the last kitten images you ve viewed, updated in real time. Don t worry, Zeitgeist can provide for this:
It s important to note that on_delete won t be called when an image is deleted (that d be a newly inserted event with interpretation=DELETE_EVENT); rather, it s called when a previously inserted event is deleted (for example, using the forget recent history option in Activity Log Manager).
In case you re curious: for best performance, this doesn t actually use D-Bus signals. Instead, this little call will setup a D-Bus object behind the scenes and register it with the Zeitgeist engine, so it can notify said object when (and only when) an event of its interest is registered.
To stop receiving notifications for a template, you ll need the save the object returned by the install_monitor call:
Pro Tip: You can use the Zeitgeist Explorer GUI to quickly try out different queries (note: it s still work in progress, so much funcionality is missing, but it does work somewhat).
Contextual awesomeness: finding related events
By now you re familiar with retrieving events and keeping them up to date. Now it s time for a little secret:
This little query example will return up to 10 websites I used at the same time as the Vala files inside my Zeitgeist directory, considering only data from the last 6 months. Nice, huh?
This is an experimental feature, and it doesn t work well when operating on big inputs, so it s usually better to use the find_related_uris_for_uris variant (which replaces the first query_templates parameter with a list of URIs).
Advanced searching: the FTS extension
Some people think prefix searches aren t good enough for them, and this is why the Zeitgeist engine ships by default with a FTS (Full Text Search) extension.
Using the methods provided by this extension you can perform more advanced queries against subjects current_uri and text properties (unlike the name may suggest, the FTS extension doesn t index the content of the files, but just the information in the event).
This is exposed as zeitgeist_index_search in libzeitgeist (the C library), but unfortunately isn t currently available in the Python API. If you still need it, you ll have to fallback to pretty much using the D-Bus interface (you still get reconnection support, though). Here s an example:
The most interesting thing here is the query parameter. Quoting from the C documentation:
The confirmation callback will receive a timerange going from the first to the last event. If no events were deleted (because they didn t exist), you ll get (-1, -1).
And now for the interesting part. If your application involves resources (files, websites, contacts, etc.) of any sort, you ll probably want to let Zeitgeist know that you re using them. It s time that you write a data-source! We start by registering the data-source. Here we go:
Once that s done (and if it is enabled), we are free to send our events:
If you don t know what interpretation and manifestation your subject should have, you can use the following utility methods:
Event better, with Zeitgeist 0.9 you can just leave the subject (but not event!) interpretation and manifestation fields empty, and they ll be guessed the same way as if you used those utility methods.
Pro Tip: You can examine all registered data-sources and toggle whether they are enabled or not using the zeitgeist-data-sources-gtk.py tool.
Conclusion
Wow, I m impressed if you ve got this far. By now you should have quite a good idea on how to use the Zeitgeist API, and I m looking forward to seeing what you do with it in your next awesome project.
If you have any problem with Zeitgeist, feel free to visit us on IRC (#zeitgeist on irc.freenode.net), or join our mailing list. We ll also be at GUADEC next week, so if you re there make sure to say hi!
In case you missed them, here are some useful links:
No comments
Siegfried-Angel Gevatter Pujals, 2012. Permalink License Post tags: gnome, zeitgeist
- timestamp when did the event happen (milliseconds since Unix epoch)
- interpretation what sort of event is it (eg. opened, closed)
- manifestation why did it happen (user activity, notification )
- actor which is the primary application involved
- origin where did it come from (eg. website where you clicked the link that opened this page)
- uri
- current_uri updated URI if it changed since the event
- interpretation abstract type (document, image, video )
- manifestation how it is stored (file, remote object, website)
- origin parent folder for files, domain name for websites
- mimetype
- text a title for the event (eg. filename, website title )
- storage identifier for the storage medium of the subject (eg. local, online, pendrive X)
from zeitgeist.client import ZeitgeistClient from zeitgeist.datamodel import * zeitgeist = ZeitgeistClient() def on_events_received(events): if events: song = events[0] print "Last song: %s" % song.subjects[0].uri else: print "You haven't listened to any songs." template = Event.new_for_values(subject_interpretation=Interpretation.AUDIO) zeitgeist.find_events_for_template(template, on_events_received, num_events=1) # Start a mainloop - note: the Qt mainloop also works from gi.repository import GLib GLib.MainLoop().run()
Now this was a pretty simple example. Let s make it more interesting. One song isn t much, so let s get the 5 most recent songs. Also, now we want both songs and videos. The first part is pretty easy, we just need to change the num_events parameter. For the second extension, we have to change the event template. In fact, now we need two different event templates and the find_events_for_templates function, which takes an arbitrary number of event templates and ORs them. The result is as follows:
from zeitgeist.client import ZeitgeistClient from zeitgeist.datamodel import * zeitgeist = ZeitgeistClient() def on_events_received(events): for event in events: print "- %s" % event.subjects[0].uri tmpl1 = Event.new_for_values(subject_interpretation=Interpretation.AUDIO) tmpl2 = Event.new_for_values(subject_interpretation=Interpretation.VIDEO) zeitgeist.find_events_for_templates([tmpl1, tmpl2], on_events_received, num_events=5) # Start a mainloop from gi.repository import GLib GLib.MainLoop().run()
zeitgeist.find_events_for_templates( [tmpl1, tmpl2], on_events_received, num_events=5, result_type=ResultType.MostRecentSubjects, storage_state=StorageState.Available)
For more advanced queries, you can use more complex combinations of events and subject templates. The rule to keep in mind here is that events are OR d and subjects are ANDed. Additionally, some field (actor, origin, mimetype, uri and current_uri) may be prefixed with an exclamation mark ( ! ) for NOT, or you may append an asterisc ( * ) to them for prefix search. You can even combine the two operators together. Here s an example of a template you could build:
subj1 = Subject.new_for_values(interpretation=Interpretation.SOURCE_CODE, uri="file:///home/rainct/Development/*") subj2 = Subject.new_for_values(uri="!file:///home/rainct/Development/zeitgeist/*") tmpl1 = Event.new_for_values(interpretation=Interpretation.MODIFY_EVENT, subjects=[subj1, subj2]) templates = [tmpl1]
from zeitgeist.client import ZeitgeistClient from zeitgeist.datamodel import * zeitgeist = ZeitgeistClient() def on_events_received(events): for event in events: print '- %s' % event.subjects[0].uri def on_ids_received(event_ids): print 'A total of %d source code files were found.' % len(event_ids) print 'Fetching the first 100...' zeitgeist.get_events(event_ids[:100], on_events_received) tmpl = Event.new_for_values(subject_interpretation=Interpretation.SOURCE_CODE) zeitgeist.find_event_ids_for_templates( [tmpl], on_ids_received, num_events=10000, # you can use 0 for "all events", but do you really need to? timerange=TimeRange.from_seconds_ago(3600*24*30*3), result_type=ResultType.MostPopularSubjects) # Start a mainloop from gi.repository import GLib GLib.MainLoop().run()
from zeitgeist.client import ZeitgeistClient from zeitgeist.datamodel import * zeitgeist = ZeitgeistClient() def on_insert(time_range, events): # do awesome stuff with the events here print events def on_delete(time_range, event_ids): # a previously inserted event was deleted print event_ids templates = [Event.new_for_values(subject_uri='file:///home/user/kittens/*', subject_interpretation=Interpretation.IMAGE)] zeitgeist.install_monitor(TimeRange.always(), templates, on_insert, on_delete) # Start a mainloop from gi.repository import GLib GLib.MainLoop().run()
m = zeitgeist.install_monitor(TimeRange.always(), templates, on_insert, on_delete) zeitgeist.remove_monitor(m)
import time from zeitgeist.client import ZeitgeistClient from zeitgeist.datamodel import * zeitgeist = ZeitgeistClient() def on_related_received(uris): print 'Related URIs:' for uri in uris: print ' - %s' % uri query_templates = [Event.new_for_values( subject_interpretation=Interpretation.SOURCE_CODE, subject_uri='file:///home/rainct/Development/zeitgeist/*', subject_mimetype="text/x-vala")] result_templates = [Event.new_for_values( subject_interpretation=Interpretation.WEBSITE, subject_manifestation=Manifestation.WEB_DATA_OBJECT)] now = time.time()*1000 zeitgeist.find_related_uris_for_events( query_templates, on_related_received, time_range=TimeRange(now - 1000*3600*24*30*6, now), result_event_templates=result_templates, num_events=10) # Start a mainloop from gi.repository import GLib GLib.MainLoop().run()
from zeitgeist.client import ZeitgeistClient from zeitgeist.datamodel import * zeitgeist = ZeitgeistClient() index = zeitgeist._iface.get_extension('Index', 'index/activity') query = 'hello' # search query time_range = TimeRange.always() event_templates = [] offset = 0 num_events = 10 result_type = 100 # magic number for "relevancy" (ResultType.* also work) def on_reply(events, num_estimated_matches): print 'Got %d out of ~%d results.' % (len(events), num_estimated_matches) events = map(Event, events) for event in events: print ' - %s' % event.subjects[0].uri def on_error(exception): print 'Error from FTS:', exception index.Search(query, time_range, event_templates, offset, num_events, result_type, reply_handler=on_reply, error_handler=on_error) # Start a mainloop from gi.repository import GLib GLib.MainLoop().run()
The default boolean operator is AND. Thus the query foo bar will be interpreted as foo AND bar. To exclude a term from the result set prepend it with a minus sign - eg foo -bar. Phrase queries can be done by double quoting the string "foo is a bar". You can truncate terms by appending a *. There are a few keys you can prefix to a term or phrase to search within a specific set of metadata. They are used like key:value. The keys name and title search strictly within the text field of the event subjects. The key app searches within the application name or description that is found in the actor attribute of the events. Lastly you can use the site key to search within the domain name of the subject URIs. You can also control the results with the boolean operators AND and OR and you may use brackets, ( and ), to control the operator precedence.Modifying the log So far we ve only queried Zeitgeist for information, let s get a bit more active. You can delete events from Zeitgeist with the following query:
from zeitgeist.client import ZeitgeistClient zeitgeist = ZeitgeistClient() def on_deleted(timerange): print 'Deleted events going from %s to %s' % (timerange[0], timerange[1]) event_ids = [50] # put the IDs of the events you want to delete here zeitgeist.delete_events(event_ids, on_deleted) # Start a mainloop from gi.repository import GLib GLib.MainLoop().run()
And now for the interesting part. If your application involves resources (files, websites, contacts, etc.) of any sort, you ll probably want to let Zeitgeist know that you re using them. It s time that you write a data-source! We start by registering the data-source. Here we go:
import time from gi.repository import GLib from zeitgeist.client import ZeitgeistClient from zeitgeist.datamodel import * zeitgeist = ZeitgeistClient() def on_status_changed_callback(enabled): """ This method will be called whenever someone enables or disables the data-source. """ if enabled: print 'Data-source enabled and ready to send events!' else: print 'Data-source disabled; don\'t send event, they\'ll be ignored.' def register(): # Always use the same unique_id. Name and description can change # freely. unique_id = 'com.example.your.data.source' name = 'user visible name (may be translated)' description = 'user visible description (may be translated)' # Describe what sort of events will be inserted (optional) subject_template = Subject() subject_template.interpretation = Interpretation.PLAIN_TEXT_DOCUMENT subject_template.manifestation = Manifestation.FILE_DATA_OBJECT templates = [] for interp in (Interpretation.ACCESS_EVENT, Interpretation.LEAVE_EVENT): event_template = Event() event_template.interpretation = interp event_template.manifestation = Manifestation.USER_ACTIVITY event_template.append_subject(subject_template) templates.append(event_template) zeitgeist.register_data_source(unique_id, name, description, templates, on_status_changed_callback)
def log(title, uri, opened): subject = Subject.new_for_values( uri=uri, interpretation=Interpretation.PLAIN_TEXT_DOCUMENT, manifestation=Manifestation.FILE_DATA_OBJECT, origin=GLib.path_get_dirname(uri), mimetype='text/plain', text=title) event = Event.new_for_values( timestamp=time.time()*1000, manifestation=Manifestation.USER_ACTIVITY, actor='application://your_application_name.desktop', subjects=[subject]) if opened: event.interpretation = Interpretation.ACCESS_EVENT else: event.interpretation = Interpretation.LEAVE_EVENT def on_id_received(event_ids): print 'Logged %s (%d) with event id %d.' % (title, opened, event_ids[0]) zeitgeist.insert_events([event], on_id_received) if __name__ == '__main__': register() log('test.txt', 'file:///tmp/test.txt', opened=True) log('another_file.txt', 'file:///tmp/another_file.txt', opened=True) log('another_file.txt', 'file:///tmp/another_file.txt', opened=False) log('test.txt', 'file:///tmp/test.txt', opened=False) # Start a mainloop GLib.MainLoop().run()
from zeitgeist.mimetypes import * print get_interpretation_for_mimetype('text/plain') print get_manifestation_for_uri('file:///tmp/test.txt')
- Python API documentation
- C API documentation
- Zeitgeist project website
- Meta-project on Launchpad
- Bug tracker on FreeDesktop.org
No comments
Siegfried-Angel Gevatter Pujals, 2012. Permalink License Post tags: gnome, zeitgeist