Search Results: "Julien Danjou"

16 June 2021

Julien Danjou: Python Tools to Try in 2021

Python Tools to Try in 2021The Python programming language is one of the most popular and in huge demand. It is free, has a large community, is intended for the development of projects of varying complexity, is easy to learn, and opens up great opportunities for programmers. To work comfortably with it, you need special Python tools, which are able to simplify your work. We have selected the best Python tools that will be relevant in 2021.

MailtrapAs you may probably know, in order to send an email, you need SMTP (Simple Mail Transfer Protocol). This is because you can't just send a letter to the recipient. It needs to be sent to the server from which the recipient will download this letter using IMAP and POP3.Mailtrap provides an opportunity to send emails in python. Moreover, Mailtrap provides #rest #api to access current emails. It can be used to automate email testing, which will improve your email marketing campaigns. For example, you can check the password recovery form in the Selenium Test and immediately see if an email was sent to the correct address. Then take a new password from the email and try to enter the site with it. Cool, isn't it?

Pros
  • All emails are in one place.
  • Mailtrap provides multiple inboxes.
  • Shared access is present.
  • It is easy to set up.
  • RESTful API

ConsNo visible disadvantages were found.

Django
Python Tools to Try in 2021
Django is a free and open-source full-stack framework. It is one of the most important and popular among Python developers. It helps you move from a prototype to a ready-made working solution in a short time since its main task is to automate processes and speed up work through associations and libraries. It s a great choice for a product launch.You can use Django if at least a few of the following points interest you:
  • There is a need to develop the server-side of the API.
  • You need to develop a web application.
  • In the course of work, many changes are made, you have to constantly deploy the application and make edits.
  • There are many complex tasks that are difficult to solve on your own, and you will need the help of the community.
  • ORM support is needed to avoid accessing the database directly.
  • There is a need to integrate new technologies such as machine learning.
Django is a great Python Web Framework that does its job. It is not for nothing that it is one of the most popular, and is actively used by millions of developers.

ProsDjango has quite a few advantages. It contains a large number of ready-made solutions, which greatly simplifies development. Admin panel, database migration, various forms, user authentication tools are extremely helpful. The structure is very clear and simple.A large community helps to solve almost any problem. Thanks to ORM, there is a high level of security and it is comfortable to work with databases.

ConsDespite its powerful capabilities, Django's Python Web Framework has drawbacks. It is very massive, monolithic, therefore it develops slowly. Despite the many generic modules, the development speed of Django itself is reduced.

CherryPy
Python Tools to Try in 2021
CherryPy is a micro-framework. It is designed to solve specific problems, capable of running the program on any operating system. CherryPy is used in the following cases:
  • To create an application with small code size.
  • There is a need to manage several servers at the same time.
  • You need to monitor the performance of applications.
CherryPy refers to Python Frameworks, which are designed for specific tasks. It's clear, user-friendly, and ideal for Android development.

ProsCherryPy Python tool has a friendly and understandable development environment. This is a functional and complete framework, which can be used to build good applications. The source code is open, so the platform is completely free for developers, and the community, although not too large, is very responsive, and always helps to solve problems.

ConsThere are not so many cons to this Python tool. It is not capable of performing complex tasks and functions, it is intended more for specific solutions, for example, for the development of certain plugins or modules.

Pyramid
Python Tools to Try in 2021
Python Pyramid tool is designed for programming complex objects and solving multifunctional problems. It is used by professional programmers and is traditionally used for identification and routing. It is aimed at a wide audience and is capable of developing API prototypes.It is used in the following cases:
  • You need problem indicator tools to make timely adjustments and edits.
  • You use several programming languages at once;
  • You work with reporting and financial calculations, forecasting;
  • You need to quickly create a simple application.
At the same time, the Python Web Framework Pyramid allows you to create complex applications with great functionality like a translation software.

ProsPyramid does an excellent job of developing basic applications quickly. It is quite flexible and easy to learn. In fact, the key to the success of this framework is that it is completely based on fundamental principles, using simple and basic programming techniques. It is minimalistic, but at the same time offers users a lot of freedom of action. It is able to work with both small applications and powerful multifunctional programs.

ConsIt is difficult to deviate from the basic principles. This Python tool makes the decision for you. Simple programs are very easy to implement. But to do something complex and large-scale, you have to completely immerse yourself in the study of the environment and obey it.

Grok
Python Tools to Try in 2021
Grok is a Python tool that works with templates. Its main task is to eliminate repetitions in the code. If the element is repeated, then the template that was already created earlier is simply applied. This greatly simplifies and speeds up the work.Grok suits developers in the following cases:
  • If a programmer has little experience and is not yet ready to develop his modules.
  • There is a need to quickly develop a simple application.
  • The functionality of the application is simple, straightforward, and the interface does not play a key role.

ProsThe Grok framework is a child of Zope3, which was released earlier. It has a simplified structure of work, easy installation of modules, more capabilities, and better flexibility. It is designed to develop small applications. Yes, it is not intended for complex work, but due to its functionality, it allows you to quickly implement a project.

ConsThe Grok community is not very large, as this Python tool has not gained widespread popularity. Nevertheless, it is used by Python adepts for comfortable development. It is impossible to implement complex tasks on it since the possibilities are quite limited.Grok is one of the best Python Web Frameworks. It is understandable and has enough features for comfortable development.

Web2Py
Python Tools to Try in 2021
Web2Py is a Python tool that has its own IDEwhich, which includes a code editor, debugger, and deployment. It works great without the need for configuration or installation, provides a high level of data security, and is suitable for work on various platforms.Web2Py is great in the following cases:
  • When there is a need to develop something on different operating systems.
  • If there is no way to install and configure the framework.
  • When a high level of data security is required, for example, when developing financial applications or sales performance management tools.
  • If you need to carefully track bugs right during development, and not during the testing phase.

ProsWeb2Py is capable of working with different protocols, has a built-in error tracker, and has a backward compatibility feature that helps to work on the basis of previous versions of the framework. This means that code maintenance becomes much easier and cheaper. It's free, open-source, and very flexible.

ConsAmong the many Python tools, there are not many that require the latest version of the language. Web2Py is one of those and won't work on Python 3 and below. Therefore, you need to constantly monitor the updates.Web2Py does an excellent job of its tasks. It is quite simple and accessible to everyone.

BlueBream
Python Tools to Try in 2021
BlueBream used to be called Zope3 before. It copes well with tasks of the medium and high level of complexity and is suitable for working on serious projects.

ProsThe BlueBream build system is quite powerful and suitable for complex tasks. You can create functional applications on it, and the principle of reuse of components makes the code easier. At the same time, the speed of development increases. The software can be scaled, and a transactional object database provides an easy path to store it. This means that queries are processed quickly and database management is simple.

ConsThis is not a very flexible framework, it is better to know clearly in advance what is required of it. In addition, it cannot withstand heavy loads. When working with 1000 users at the same time, it can crash and give errors. Therefore, it should be used to solve narrow problems.Python frameworks are often designed for specific tasks. BlueBream is one of these and is suitable for applications where database management plays a key role.

ConclusionPython tools come in different forms and have vastly different capabilities. There are quite a few of them, but in 2021 these will be the most popular and in demand. Experienced programmers always choose several development tools for their comfortable work.

11 February 2021

Julien Danjou: Debugging C code on macOS

Debugging C code on macOSI started to write C 25 years ago now, with many different tools over the year. As many open source developers, I spent most of my life working with the GNU tools out there.As I've been using an Apple computer over the last years, I had to adapt to this environment and learn the tricks of the trade. Here are some of my notes so a search engine can index them and I'll be able to find them later.

Debugger: lldbI was used to gdb for most of years doing C. I never managed to install gdb correctly on macOS as it needs certificates, authorization, you name it, to work properly.macOS provides a native debugger named lldb, which really looks like gdb to me it runs in a terminal with a prompt.I had to learn the few commands I mostly use, which are:
  • lldb -- myprogram -options to run the program with options
  • r to run the program
  • bt or bt N to get a backtrace of the latest N frames
  • f N to select frame N
  • p V to print some variable value or memory address
Those commands cover 99% of my use case with a debugger when writing C, so once I lost my old gdb habits, I was good to go.

Debugging Memory Overflows

On GNU/LinuxOne of my favorite tools when writing C has always been Electric Fence (and DUMA more recently). It's a library that overrides the standard memory manipulation function (e.g., malloc) and instantly makes the program crash when an out of memory error is produced, rather than corrupting the heap.Heap corruption issues are hard to debug without such tools as they can happen at any time and stay unnoticed for a while, crashing your program in a totally different location later.There's no need to compile your program with those libraries. By using the dynamic loader, you can preload them and overload the standard C library functions.
LD_PRELOAD=/usr/lib/libefence.so.0.0 my-program
Run my-program with Eletric Fence loaded on GNU/Linux
My gdb configuration has been sprinkle with my friends efence and duma, and I would activate them from gdb easily with this configuration in ~/.gdbinit:
define efence
        set environment EF_PROTECT_BELOW 0
        set environment EF_ALLOW_MALLOC_0 1
        set environment LD_PRELOAD /usr/lib/libefence.so.0.0
        echo Enabled Electric Fence\n
end
document efence
Enable memory allocation debugging through Electric Fence (efence(3)).
        See also nofence and underfence.
end
define underfence
        set environment EF_PROTECT_BELOW 1
        set environment EF_ALLOW_MALLOC_0 1
        set environment LD_PRELOAD /usr/lib/libefence.so.0.0
        echo Enabled Electric Fence for underflow detection\n
end
document underfence
Enable memory allocation debugging for underflows through Electric Fence
(efence(3)).
        See also nofence and efence.
end
define nofence
        unset environment LD_PRELOAD
        echo Disabled Electric Fence\n
end
document nofence
Disable memory allocation debugging through Electric Fence (efence(3)).
end
define duma
        set environment DUMA_PROTECT_BELOW 0
        set environment DYMA_ALLOW_MALLOC_0 1
        set environment LD_PRELOAD /usr/lib/libduma.so
        echo Enabled DUMA\n
end
document duma
Enable memory allocation debugging through DUMA (duma(3)).
        See also noduma and underduma.
end

On macOSI've been looking for equivalent features in macOS, and after many hours of research, I found out that this feature is shipped natively with libgmalloc. It works in the same way, and its features are documented by Apple.My ~/.lldbinit file now contains the following:
command alias gm _regexp-env DYLD_INSERT_LIBRARIES=/usr/lib/libgmalloc.dylib
This command alias allows enabling gmalloc by just typing gm at the lldb prompt and then run the program again to see if it crashes with gmalloc enabled.

Debugging CPythonIt's not a mystery that I spend a lot of time writing Python code that's the main reason I've been doing C lately.When playing with CPython, it can be useful to, e.g., dump the content of PyObject structs on the heap or get the Python backtrace.I've been using cpython-lldb for this with great success. It adds a few bells and whistles when debugging CPython or extensions inside lldb. For example, the alias py-bt is handy to get the Python traceback of your calls rather than a bunch of cryptic C frames.Now, you should be ready to debug your nasty issues and memory problems on macOS efficiently!

22 December 2020

Julien Danjou: I am a Software Engineer and I am in Charge

I am a Software Engineer and I am in ChargeFifteen years have passed since I started my career in IT which is quite some time. I've been playing with computers for 25 years now, which makes me quite knowledgeable about the field, for sure.However, while I was fully prepared to bargain with computers, I was not prepared to do so with humans. The whole career management thing was unknown to me. I had no useful skills to navigate within the enterprise organization. I had to learn the ropes the hard way, failing along the way. It hurt.Almost ten years ago, I had the chance to meet a new colleague Alexis Monville. Alexis was a team facilitator, and I started to work with him on many non-technical levels. He taught me a lot about agility and team organization. Working on this set of new skills changed how I envisioned my work and how I fit into the company.
I am a Software Engineer and I am in ChargeAlexis Monville
I worked on those aspects of my job because I decided to be in charge of my career rather than keeping things boring. That was one of the best decisions I ever made. Growing the social aspect of my profession allowed me to develop and find aspiring jobs and missions.Getting to that point takes a lot of time and effort, and it's pretty hard to do it alone. My friend Alexis wrote an excellent book titled I am a Software Engineer and I am in Charge. I'm proud to have been the first reviewer the book before it was released a few weeks ago.Many developers out there are stuck in a place where they are not excited by their colleagues' work and whose managers do not appropriately recognize their achievement. It would be best for them if they did something about that.
I am a Software Engineer and I am in ChargeThe book!
This book is an excellent piece for engineers who wants to break the cycle of frustration. It covers many situations I encountered across my professional life those last years, giving good insights into how to solve them.To paraphrase Alexis, the answers to your career management problems are not on StackOverflow they're not technical issues. However, you can still solve them with the right tools. That's where I am a Software Engineer and I am in Charge shines. It gives you leads, solutions, and exercise to get out of this kind of situation. It helps increase your impact and satisfaction at work.I love this book, and I wish I had access to it years ago. Developing technical leadership is not easy and requires a mindset shift. Having a way to bootstrap yourself with this is a luxury.If you're a software engineer at the beginning of your career or struggling with your current professional situation, I profoundly recommend reading this book! You'll get a fast track on your career, for sure.
I am a Software Engineer and I am in Charge

11 May 2020

Julien Danjou: Interview: The Performance of Python

Interview: The Performance of PythonEarlier this year, I was supposed to participate to dotPy, a one-day Python conference happening in Paris. This event has unfortunately been cancelled due to the COVID-19 pandemic.Both Victor Stinner and me were supposed to attend that event. Victor had prepared a presentation about Python performances, while I was planning on talking about profiling.Rather than being completely discouraged, Victor and I sat down (remotely) with Anne Laure from Behind the Code (a blog ran by Welcome to the Jungle, the organizers of the dotPy conference).We discuss Python performance, profiling, speed, projects, problems, analysis, optimization and the GIL.You can read the interview here.
Interview: The Performance of Python

17 April 2020

Julien Danjou: Being in Charge

Being in ChargeIf you never heard of the 10x engineer myth, it's a pretty great concept. It boils down to the idea where an engineer could be 10x more efficient than a random engineer. I find this fantastically twisted.Last week, I sat and chat with Alexis Monville in Le Podcast a podcast that equips you to make positive change in your organization. We talked about that 10x Engineer myth, and from there we digressed on how to grow your career and handle the different aspects of it.This was a very interesting exchange. Alexis is actually going to publish a new book next month (May 2020) entitled I am a Software Engineer and I am in charge.
Being in Charge
Lucky me, this week, I had the chance to be able to read the book before everybody else which means I actually read after our recording. I understood why Alexis said that a lot of what I was talking about during our podcast resonated with him. I send a detailed review of the book to Alexis and Michael if you're curious. I'm definitely recommending this book if you want to stop complaining about your job and start understanding how to pull the strings.I wish I had this book available 10 years ago!

12 March 2020

Julien Danjou: One year of Mergify

One year of MergifyIt has been close to a year now that I've incorporated my new company, Mergify. I've been busy, and I barely wrote anything about it so far. Now is an excellent time to take a break and reflect a bit on what happened during those last 12 months.
One year of Mergify

What problem does Mergify solve?Mergify is a powerful automation engine for GitHub pull requests. It allows you to automate everything and especially merging. You write rules, and it handles the rest.
One year of MergifyExample of rule matching returned in GitHub checks
For example, let's say you want your pull request to be merged, e.g., once your CI passes and the pull request has been approved. You just write such a rule, and our engine merges the pull request as soon as it's ready.We also deal with more advanced use cases. For instance, we provide a merge queue so your pull requests are merged serially and tested by your CI one after another avoiding any regression in your code.Our goal is to make pull request management and automation easy. You can use your bot to trigger a rebase of your pull requests, or a backport to a different branch, just with a single comment.
One year of MergifySome people like to make bots talk to each other.

A New AdventureMergify is the first company that I ever started. I did run some personal businesses before, created non-profit organizations, built FOSS projects but I never created a company from scratch, even less with an associate.Indeed, I've chosen to build the company with my old friend Mehdi. We've known each others for 7 years now, and have worked together all that time on different open-source projects. Having worked with each other for so long has probably been a critical factor in the success of our venture so far.I had little experience sharing the founding seats with someone, and tons of reading seemed to indicate that it would be a tough ride. Picking the right business partner(s) can be a hard task. Luckily, after working so much time together, Mehdi and I both know our strengths and weaknesses well enough to be able to circumvent them. On the other hand, we both have similar backgrounds as software engineers. That does not help to cover all the hats you need to wear when building a company. Over time, we found arrangements to cover most of those equally between us.We don't have any magical advice to give on this. As in every relationship, communication is the key, and the #1 factor of success.

Getting UsersI don't know if we got lucky, but we got users and customers pretty early. We used a few cooperative projects as guinea pigs first, and they were brave enough to try our service and give us feedback. No repository has been harmed during this first phase!Then, as soon as we managed to get our application on the GitHub Marketplace, we saw a steady number of users coming to us.This has been fantastic as it allowed us to get feedback rapidly. We set up a form asking users for feedback after they used Mergify for a couple of weeks. What we hear is that users were happy, that the documentation was confusing and that some features were buggy or missing. We planned all of those ideas as our future work in our roadmap, using the principles we described a few months ago.
How we handle our roadmap for Mergify
Whatever you re building a company, a product, or a house a time comes where you need planning. Pushing random buttons to move forward does not work anymore. You need to take a step back to
One year of Mergify
If you're curious, you can read this article.
We tried various strategies to get new users, but so far, organic growth has been our #1 way of onboarding new users. Like many small startups out there, we're not that good at marketing and executing strategies.We provide our service for free for open-source projects We are now powering many organizations, such as Mozilla, Amazon Web Services, Ceph and Fedora.

Working with GitHubWorking with GitHub has been complicated. When you build an application for a marketplace, your business is entirely dependent on the platform you develop for both in terms of features and quality of service.In our case, we hit quite many bugs with GitHub. Their support has mostly been fast to answer, but some significant issues are still opened months later. The truth is that the GitHub API could deserve more love and care from GitHub. For example, their GraphQL API is a work in progress for years and miss out many essential features.
One year of MergifyGitHub service status is not always green.
We dealt and still deal with all those issues. It obviously impacts our operations and decreases our overall velocity. We regularly have to find new ways to sidestep GitHub limitations.You have no idea how we wished for GitHub to be open-source. The idea of not having access to their code and understand how it works is so frustrating that we publish our engine as an open-source project. That allows all of our users to see how it works and even propose enhancements.
One year of Mergify

Automate all the wayWe're a tiny startup, and we decided to bootstrap our company. We never took any funding. From the beginning, it has been clear to us that we had to think and act like we had no resources. We're built around a scarcity mindset. Every decision we make is based on the assumption that we basically are very limited in terms of money and time.We basically act like any wrong choice we do could (virtually) kill the company. We only do what is essential, we ship fast, and we automate everything.For example, we have built our whole operation about CI/CD systems, and pushing any new fix or feature in production is done in a matter of minutes. It's not uncommon for us to push a fix from our phone, just by reviewing some code or editing a file.

GrowthWe're extremely happy with our steady growth and more users using our service. We now manage close to 30k repositories and merge 15k pull requests per month for our users.That's a lot of mouse clicks saved!If you want to try Mergify yourself, it's a single click log-in using your GitHub account. Check it out!

31 July 2017

Chris Lamb: Free software activities in July 2017

Here is my monthly update covering what I have been doing in the free software world during July 2017 (previous month): I also blogged about my recent lintian hacking and installation-birthday package.
Reproducible builds

Whilst anyone can inspect the source code of free software for malicious flaws, most software is distributed pre-compiled to end users. The motivation behind the Reproducible Builds effort is to permit verification that no flaws have been introduced either maliciously or accidentally during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. (I have generously been awarded a grant from the Core Infrastructure Initiative to fund my work in this area.) This month I:
  • Assisted Mattia with a draft of an extensive status update to the debian-devel-announce mailing list. There were interesting follow-up discussions on Hacker News and Reddit.
  • Submitted the following patches to fix reproducibility-related toolchain issues within Debian:
  • I also submitted 5 patches to fix specific reproducibility issues in autopep8, castle-game-engine, grep, libcdio & tinymux.
  • Categorised a large number of packages and issues in the Reproducible Builds "notes" repository.
  • Worked on publishing our weekly reports. (#114 #115, #116 & #117)

I also made the following changes to our tooling:
diffoscope

diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues.

  • comparators.xml:
    • Fix EPUB "missing file" tests; they ship a META-INF/container.xml file. [ ]
    • Misc style fixups. [ ]
  • APK files can also be identified as "DOS/MBR boot sector". (#868486)
  • comparators.sqlite: Simplify file detection by rewriting manual recognizes call with a Sqlite3Database.RE_FILE_TYPE definition. [ ]
  • comparators.directory:
    • Revert the removal of a try-except. (#868534)
    • Tidy module. [ ]

strip-nondeterminism

strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build.

  • Add missing File::Temp imports in the JAR and PNG handlers. This appears to have been exposed by lazily-loading handlers in #867982. (#868077)

buildinfo.debian.net

buildinfo.debian.net is my experiment into how to process, store and distribute .buildinfo files after the Debian archive software has processed them.

  • Avoid a race condition between check-and-creation of Buildinfo instances. [ ]


Debian My activities as the current Debian Project Leader are covered in my "Bits from the DPL emails to the debian-devel-announce mailing list.
Patches contributed
  • obs-studio: Remove annoying "click wrapper" on first startup. (#867756)
  • vim: Syntax highlighting for debian/copyright files. (#869965)
  • moin: Incorrect timezone offset applied due to "84600" typo. (#868463)
  • ssss: Add a simple autopkgtest. (#869645)
  • dch: Please bump $latest_bpo_dist to current stable release. (#867662)
  • python-kaitaistruct: Remove Markdown and homepage references from package long descriptions. (#869265)
  • album-data: Correct invalid Vcs-Git URI. (#869822)
  • pytest-sourceorder: Update Homepage field. (#869125)
I also made a very large number of contributions to the Lintian static analysis tool. To avoid duplication here, I have outlined them in a separate post.

Debian LTS

This month I have been paid to work 18 hours on Debian Long Term Support (LTS). In that time I did the following:
  • "Frontdesk" duties, triaging CVEs, etc.
  • Issued DLA 1014-1 for libclamunrar, a library to add unrar support to the Clam anti-virus software to fix an arbitrary code execution vulnerability.
  • Issued DLA 1015-1 for the libgcrypt11 crypto library to fix a "sliding windows" information leak.
  • Issued DLA 1016-1 for radare2 (a reverse-engineering framework) to prevent a remote denial-of-service attack.
  • Issued DLA 1017-1 to fix a heap-based buffer over-read in the mpg123 audio library.
  • Issued DLA 1018-1 for the sqlite3 database engine to prevent a vulnerability that could be exploited via a specially-crafted database file.
  • Issued DLA 1019-1 to patch a cross-site scripting (XSS) exploit in phpldapadmin, a web-based interface for administering LDAP servers.
  • Issued DLA 1024-1 to prevent an information leak in nginx via a specially-crafted HTTP range.
  • Issued DLA 1028-1 for apache2 to prevent the leakage of potentially confidential information via providing Authorization Digest headers.
  • Issued DLA 1033-1 for the memcached in-memory object caching server to prevent a remote denial-of-service attack.

Uploads
  • redis:
    • 4:4.0.0-1 Upload new major upstream release to unstable.
    • 4:4.0.0-2 Make /usr/bin/redis-server in the primary package a symlink to /usr/bin/redis-check-rdb in the redis-tools package to prevent duplicate debug symbols that result in a package file collision. (#868551)
    • 4:4.0.0-3 Add -latomic to LDFLAGS to avoid a FTBFS on the mips & mipsel architectures.
    • 4:4.0.1-1 New upstream version. Install 00-RELEASENOTES as the upstream changelog.
    • 4:4.0.1-2 Skip non-deterministic tests that rely on timing. (#857855)
  • python-django:
    • 1:1.11.3-1 New upstream bugfix release. Check DEB_BUILD_PROFILES consistently, not DEB_BUILD_OPTIONS.
  • bfs:
    • 1.0.2-2 & 1.0.2-3 Use help2man to generate a manpage.
    • 1.0.2-4 Set hardening=+all for bindnow, etc.
    • 1.0.2-5 & 1.0.2-6 Don't use upstream's release target as it overrides our CFLAGS & install RELEASES.md as the upstream changelog.
    • 1.1-1 New upstream release.
  • libfiu:
    • 0.95-4 Apply patch from Steve Langasek to fix autopkgtests. (#869709)
  • python-daiquiri:
    • 1.0.1-1 Initial upload. (ITP)
    • 1.1.0-1 New upstream release.
    • 1.1.0-2 Tidy package long description.
    • 1.2.1-1 New upstream release.

I also reviewed and sponsored the uploads of gtts-token 1.1.1-1 and nlopt 2.4.2+dfsg-3.

Debian bugs filed
  • ITP: python-daiquiri Python library to easily setup basic logging functionality. (#867322)
  • twittering-mode: Correct incorrect time formatting due to "84600" typo. (#868479)

28 January 2017

Bits from Debian: Debian at FOSDEM 2017

On February 4th and 5th, Debian will be attending FOSDEM 2017 in Brussels, Belgium; a yearly gratis event (no registration needed) run by volunteers from the Open Source and Free Software community. It's free, and it's big: more than 600 speakers, over 600 events, in 29 rooms. This year more than 45 current or past Debian contributors will speak at FOSDEM: Alexandre Viau, Bradley M. Kuhn, Daniel Pocock, Guus Sliepen, Johan Van de Wauw, John Sullivan, Josh Triplett, Julien Danjou, Keith Packard, Martin Pitt, Peter Van Eynde, Richard Hartmann, Sebastian Dr ge, Stefano Zacchiroli and Wouter Verhelst, among others. Similar to previous years, the event will be hosted at Universit libre de Bruxelles. Debian contributors and enthusiasts will be taking shifts at the Debian stand with gadgets, T-Shirts and swag. You can find us at stand number 4 in building K, 1 B; CoreOS Linux and PostgreSQL will be our neighbours. See https://wiki.debian.org/DebianEvents/be/2017/FOSDEM for more details. We are looking forward to meeting you all!

30 March 2016

Julien Danjou: The OpenStack Schizophrenia

When I started contributing to OpenStack, almost five years ago, it was a small ecosystem. There were no foundation, a handful of projects and you could understand the code base in a few days. Fast forward 2016, and it is a totally different beast. The project grew to no less than 54 teams, each team providing one or more deliverable. For example, the Nova and Swift team each one produces one service and its client, whereas the Telemetry team produces 3 services and 3 different clients. In 5 years, OpenStack went to a few IaaS projects, to 54 different teams tackling different areas related to cloud computing. Once upon a time, OpenStack was all about starting some virtual machines on a network, backed by images and volumes. Nowadays, it's also about orchestrating your network deployment over containers, while managing your application life-cycle using a database service, everything being metered and billed for. This exponential growth has been made possible with the decision of the OpenStack Technical Committee to open the gates with the project structure reform voted at the end of 2014. This amendment suppresses the old OpenStack model of "integrated projects" (i.e. Nova, Glance, Swift ). The big tent, as it's called, allowed OpenStack to land new projects every month, growing from the 20 project teams of December 2014 to the 54 we have today multiplying the number of projects by 2.7 in a little more than a year. Amazing growth, right? And this was clearly a good change. I sat at the Technical Committee in 2013, when projects were trying to apply to be "integrated", after Ceilometer and Heat were. It was painful to see how the Technical Committee was trying to assess whether new projects should be brought in or not. But what I notice these days, is how OpenStack is still stuck between its old and new models. On one side, it accepted a lot of new teams, but on the other side, many are considered as second-class citizens. Efforts are made to continue to build an OpenStack project that does not exist anymore. For example, there is a team trying to define what's OpenStack core, named DefCore. That is looking to define which projects are, somehow, actually OpenStack. This leads to weird situations, such as having non-DefCore projects seeing their doc rejected from installation guides. Again, I reiterated my proposal to publish documentation as part of each project code to solve that dishonest situation and put everything on a level playing field Some cross-projects specs are also pushed without implication of all OpenStack projects. For example, The deprecate-cli spec which proposes to deprecate command-line interface tools proposed by each project had a lot of sense in the old OpenStack sense, where the goal was to build a unified and ubiquitous cloud platform. But when you now have tens of projects with largely different scopes, this start making less sense. Still, this spec was merged by the OpenStack Technical Committee this cycle. Keystone is the first project to proudly force users to rely on openstack-client, removing its old keystone command line tool. I find it odd to push that specs when it's pretty clear that some projects (e.g. Swift, Gnocchi ) have no intention to go down that path. Unfortunately, most specs pushed by the Technical Committee are in the realm of wishful thinking. It somehow makes sense, since only a few of the members are actively contributing to OpenStack projects, and they can't by themselves implement all of that magically. But OpenStack is no exception in the free software world and remains a do-ocracy. There is good cross-project content in OpenStack, such as the API working group. While the work done should probably not be OpenStack specific, there's a lot that teams have learned by building various HTTP REST API with different frameworks. Compiling this knowledge and offering it as a guidance to various teams is a great help. My fellow developer Chris Dent wrote a post about what he would do on the Technical Committee. In this article, he points to a lot of the shortcomings I described here, and his confusion between OpenStack being a product or being a kit is quite understandable. Indeed, the message broadcasted by OpenStack is still very confusing after the big tent openness. There's no enough user experience improvement being done. The OpenStack Technical Committee election is opened for April 2016, and from what I read so far, many candidates are proposing to now clean up the big tent, kicking out projects that do not match certain criteria anymore. This is probably a good idea, there is some inactive project laying around. But I don't think that will be enough to solve the identity crisis that OpenStack is experiencing. So this is why, once again this cycle, I will throw my hat in the ring and submit my candidacy for OpenStack Technical Committee.

19 February 2016

Julien Danjou: Gnocchi 2.0 release

A little more than 3 months after our latest minor release, here is the new major version of Gnocchi, stamped 2.0.0. It contains a lot of new and exciting features, and I'd like to talk about some of them to celebrate! You may notice that this release happens in the middle of the OpenStack release cycle. Indeed, Gnocchi does not follow that 6-months cycle, and we release whenever our code is ready. That forces us to have a more iterative approach, less disruptive for other projects and allow us to achieve a higher velocity. Applying the good old mantra release early, release often. Documentation
This version features a large documentation update. Gnocchi is still the only OpenStack server project that implements a "no doc, no merge" policy, meaning any code must come with the documentation addition or change included in the patch. The full documentation is included in the source code and available online at gnocchi.xyz. Data split & compression
I've already covered this change extensively in my last blog about timeseries compression. Long story short, Gnocchi now splits timeseries archives in small chunks that are compressed, increasing speed and decreasing data size. Measures batching support
Gnocchi now supports batching, which allow submitting several measures for different metric in a single request. This is especially useful in the context where your application tends to cache metrics for a while and is able to send them in a batch. Usage is fully documented for the REST API. Group by support in aggregation
One of the most demanded features was the ability to do measure aggregation no resource, using a group by type query. This is now possible using the new groupby parameter to aggregation queries. Ceph backend optimization
We improved the Ceph back-end a lot. Mehdi Abaakouk wrote a new Python binding for Ceph, called Cradox, that is going to replace the current Python rados module in the subsequent Ceph releases. Gnocchi makes usage of this new module to speed things up, making the Ceph based driver really, really faster than before. We also implemented asynchronous data deletion, which improves performance a bit. The next step will be to run some new benchmarks like I did a few months ago and compare with the Gnocchi 1.3 series. Stay tuned!

15 February 2016

Julien Danjou: Timeseries storage and data compression

The first major version of the scalable timeserie database I work on, Gnocchi was a released a few months ago. In this first iteration, it took a rather naive approach to data storage. We had little ideas about if and how our distributed back-ends were going to be heavily used, so we stuck to the code of the first proof-of-concept written a couple of years ago. Recently we got more feedbacks from our users, ran a few benchmarks. That gave us enough feedback to start investigating in improving our storage strategy. Data split Up to Gnocchi 1.3, all data for a single metric are stored in a single gigantic file per aggregation method (min, max, average ). This means that the file can grow to several megabytes in size, which make it slow to manipulate. For the next version of Gnocchi, our first work has been to rework that storage and split the data into smaller parts. Gnocchi Carbonara archives split The diagram above shows how data are organized inside Gnocchi. Until version 1.3, there would have been only one file for each aggregation methods. In the upcoming 2.0 version, Gnocchi will split all these data into smaller parts, where each data split is stored in a file/object. This allows to manipulate smaller pieces of data and to increase the parallelism of the CRUD operations on the back-end leading to large speed improvement. In order to split timeseries into several chunks, Gnocchi defines a maximum number of N points to keep per chunk, to limit their maximum size. It then defines a hash function that produces a non-unique key for any timestamp. It makes it easy to find in which chunk any timestamp should be stored or retrieved. Data compression Up to Gnocchi 1.3, the data stored for each metric is simply serialized using msgpack, a fast and small serialization format. Though, this format does not provide any compression. That means that storing data points needs 8 bytes for a timestamp (64 bits timestamp with nanosecond precision) and 8 bytes for a value (64 bits double-precision floating-point), plus some overhead (extra information and msgpack itself). After looking around on how to compress all these measures, I stumbled upon a paper from some Facebook engineers called about Gorilla, their in-memory timeserie database, entitled "Gorilla: A Fast, Scalable, In-Memory Time Series Database". For reference, part of this encoding is also used by InfluxDB in its new storage engine. The first technique I implemented is easy enough, and it's inspired from delta-of-delta encoding. Instead of storing each timestamp for each data point, and since all the data points are aggregated on a regular interval, we transpose points to be the time difference divided by the interval. For example, the suite of timestamps timestamps = [41230, 41235, 41240, 41250, 41255] is encoded into timestamps = [41230, 1, 1, 2, 1], interval = 5. This allows regular compression algorithms to reduce the size of the integer list using run-length encoding. To actually compress the values, I tried two different algorithms: I then benchmarked these solutions: Gnocchi Carbonara compression speed The XOR algorithm implemented in Python is pretty slow, compared to LZ4. Truth is that python-lz4 is fully implemented in C, which makes it fast. I've profiled my XOR implementation in Python, to discover that one operation took 20 % of the time: count_lead_and_trail_zeroes, which is in charge of counting the number of leading and trailing zeroes in a binary number. Gnocchi Carbonara compression XOR profiling I tried 2 Python implementations of the same algorithm (and submitted them to my friend and Python developer Victor Stinner by the way). The first version using string search with .index() is 10 faster than the second one that only do integer computation. Ah, Python As Victor explained, each Python operation is slow and there's a lot in the second version, whereas .index() is implemented in C and really well optimized and only needs 2 Python operations. Finally, I ended up optimizing that code by leveraging cffi to use directly ffsll() and flsll(). That decreased the run-time of count_lead_and_trail_zeroes by 45 %, making the entire XOR compression code speed increased by a small 7 %. This is not enough to catch up with LZ4 speed. At this stage, the only solution to achieve a high-speed would probably to go with a full C implementation. Gnocchi Carbonara compression size Considering the compression ratio of the different algorithms, they are pretty much identical. The worst case scenario (random values) for LZ4 compress down to 9 bytes per data point, whereas XOR can go down to 7.38 bytes per data point. In general XOR encoding beats LZ4 by 15 %, except for cases where all values are 0 or 1. However, LZ4 is faster than XOR by a factor of 4 -70 depending on cases. That means that we'll use LZ4 for data compression in Gnocchi 2.0. It's possible that we could achieve as fast compression/decompression algorithm, but I don't think it's worth the effort right now it'd represent a lot of code to write and to maintain.

6 February 2016

Julien Danjou: FOSDEM 2016, recap

Last week-end, I was in Brussels, Belgium for the FOSDEM, one of the greatest open source developer conference. I was not sure to go there this year (I already skipped it in 2015), but it turned out I was requested to do a talk in the shared Lua & GNU Guile devroom. As a long time Lua user and developer, and a follower of GNU Guile for several years, the organizer asked me to run a talk that would be a link between the two languages. I've entitled my talk "How awesome ended up with Lua and not Guile" and gave it to a room full of interested users of the awesome window manager . We continued with a panel discussion entitled "The future of small languages Experience of Lua and Guile" composed of Andy Wingo, Christopher Webber, Ludovic Court s, Etiene Dalcol, Hisham Muhammaad and myself. It was a pretty interesting discussion, where both language shared their views on the state of their languages. It was a bit awkward to talk about Lua & Guile whereas most of my knowledge was year old, but it turns out many things didn't change. I hope I was able to provide interesting hindsight to both community. Finally, it was a pretty interesting FOSDEM to me, and it was a long time I didn't give talk here, so I really enjoyed it. See you next year!

16 November 2015

Julien Danjou: Profiling Python using cProfile: a concrete case

Writing programs is fun, but making them fast can be a pain. Python programs are no exception to that, but the basic profiling toolchain is actually not that complicated to use. Here, I would like to show you how you can quickly profile and analyze your Python code to find what part of the code you should optimize. What's profiling? Profiling a Python program is doing a dynamic analysis that measures the execution time of the program and everything that compose it. That means measuring the time spent in each of its functions. This will give you data about where your program is spending time, and what area might be worth optimizing. It's a very interesting exercise. Many people focus on local optimizations, such as determining e.g. which of the Python functions range or xrange is going to be faster. It turns out that knowing which one is faster may never be an issue in your program, and that the time gained by one of the functions above might not be worth the time you spend researching that, or arguing about it with your colleague. Trying to blindly optimize a program without measuring where it is actually spending its time is a useless exercise. Following your guts alone is not always sufficient. There are many types of profiling, as there are many things you can measure. In this exercise, we'll focus on CPU utilization profiling, meaning the time spent by each function executing instructions. Obviously, we could do many more kind of profiling and optimizations, such as memory profiling which would measure the memory used by each piece of code something I talk about in The Hacker's Guide to Python. cProfile Since Python 2.5, Python provides a C module called cProfile which has a reasonable overhead and offers a good enough feature set. The basic usage goes down to:
>>> import cProfile
>>> cProfile.run('2 + 2')
2 function calls in 0.000 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 method 'disable' of '_lsprof.Profiler' objects

Though you can also run a script with it, which turns out to be handy:
$ python -m cProfile -s cumtime lwn2pocket.py
72270 function calls (70640 primitive calls) in 4.481 seconds

Ordered by: cumulative time

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.004 0.004 4.481 4.481 lwn2pocket.py:2(<module>)
1 0.001 0.001 4.296 4.296 lwn2pocket.py:51(main)
3 0.000 0.000 4.286 1.429 api.py:17(request)
3 0.000 0.000 4.268 1.423 sessions.py:386(request)
4/3 0.000 0.000 3.816 1.272 sessions.py:539(send)
4 0.000 0.000 2.965 0.741 adapters.py:323(send)
4 0.000 0.000 2.962 0.740 connectionpool.py:421(urlopen)
4 0.000 0.000 2.961 0.740 connectionpool.py:317(_make_request)
2 0.000 0.000 2.675 1.338 api.py:98(post)
30 0.000 0.000 1.621 0.054 ssl.py:727(recv)
30 0.000 0.000 1.621 0.054 ssl.py:610(read)
30 1.621 0.054 1.621 0.054 method 'read' of '_ssl._SSLSocket' objects
1 0.000 0.000 1.611 1.611 api.py:58(get)
4 0.000 0.000 1.572 0.393 httplib.py:1095(getresponse)
4 0.000 0.000 1.572 0.393 httplib.py:446(begin)
60 0.000 0.000 1.571 0.026 socket.py:410(readline)
4 0.000 0.000 1.571 0.393 httplib.py:407(_read_status)
1 0.000 0.000 1.462 1.462 pocket.py:44(wrapped)
1 0.000 0.000 1.462 1.462 pocket.py:152(make_request)
1 0.000 0.000 1.462 1.462 pocket.py:139(_make_request)
1 0.000 0.000 1.459 1.459 pocket.py:134(_post_request)
[ ]

This prints out all the function called, with the time spend in each and the number of times they have been called. Advanced visualization with KCacheGrind While being useful, the output format is very basic and does not make easy to grab knowledge for complete programs. For more advanced visualization, I leverage KCacheGrind. If you did any C programming and profiling these last years, you may have used it as it is primarily designed as front-end for Valgrind generated call-graphs. In order to use, you need to generate a cProfile result file, then convert it to KCacheGrind format. To do that, I use pyprof2calltree.
$ python -m cProfile -o myscript.cprof myscript.py
$ pyprof2calltree -k -i myscript.cprof

And the KCacheGrind window magically appears!
Concrete case: Carbonara optimization I was curious about the performances of Carbonara, the small timeserie library I wrote for Gnocchi. I decided to do some basic profiling to see if there was any obvious optimization to do. In order to profile a program, you need to run it. But running the whole program in profiling mode can generate a lot of data that you don't care about, and adds noise to what you're trying to understand. Since Gnocchi has thousands of unit tests and a few for Carbonara itself, I decided to profile the code used by these unit tests, as it's a good reflection of basic features of the library. Note that this is a good strategy for a curious and naive first-pass profiling. There's no way that you can make sure that the hotspots you will see in the unit tests are the actual hotspots you will encounter in production. Therefore, a profiling in conditions and with a scenario that mimics what's seen in production is often a necessity if you need to push your program optimization further and want to achieve perceivable and valuable gain. I activated cProfile using the method described above, creating a cProfile.Profile object around my tests (I actually started to implement that in testtools). I then run KCacheGrind as described above. Using KCacheGrind, I generated the following figures.
The test I profiled here is called test_fetch and is pretty easy to understand: it puts data in a timeserie object, and then fetch the aggregated result. The above list shows that 88 % of the ticks are spent in set_values (44 ticks over 50). This function is used to insert values into the timeserie, not to fetch the values. That means that it's really slow to insert data, and pretty fast to actually retrieve them. Reading the rest of the list indicates that several functions share the rest of the ticks, update, _first_block_timestamp, _truncate, _resample, etc. Some of the functions in the list are not part of Carbonara, so there's no point in looking to optimize them. The only thing that can be optimized is, sometimes, the number of times they're called.
The call graph gives me a bit more insight about what's going on here. Using my knowledge about how Carbonara works, I don't think that the whole stack on the left for _first_block_timestamp makes much sense. This function is supposed to find the first timestamp for an aggregate, e.g. with a timestamp of 13:34:45 and a period of 5 minutes, the function should return 13:30:00. The way it works currently is by calling the resample function from Pandas on a timeserie with only one element, but that seems to be very slow. Indeed, currently this function represents 25 % of the time spent by set_values (11 ticks on 44). Fortunately, I recently added a small function called _round_timestamp that does exactly what _first_block_timestamp needs that without calling any Pandas function, so no resample. So I ended up rewriting that function this way:
def _first_block_timestamp(self):
- ts = self.ts[-1:].resample(self.block_size)
- return (ts.index[-1] - (self.block_size * self.back_window))
+ rounded = self._round_timestamp(self.ts.index[-1], self.block_size)
+ return rounded - (self.block_size * self.back_window)

And then I re-run the exact same test to compare the output of cProfile.
The list of function seems quite different this time. The number of time spend used by set_values dropped from 88 % to 71 %.
The call stack for set_values shows that pretty well: we can't even see the _first_block_timestamp function as it is so fast that it totally disappeared from the display. It's now being considered insignificant by the profiler. So we just speed up the whole insertion process of values into Carbonara by a nice 25 % in a few minutes. Not that bad for a first naive pass, right?

4 November 2015

Julien Danjou: Gnocchi 1.3.0 release

Finally, Gnocchi 1.3.0 is out. This is our final release, more or less matching the OpenStack 6 months schedule, that concludes the Liberty development cycle.
This release was supposed to be released a few weeks earlier, but our integration test got completely blocked for several days just the week before the OpenStack Mitaka summit. New website We build a new dedicated website for Gnocchi at gnocchi.xyz. We want to promote Gnocchi outside of the OpenStack bubble, as it a useful timeseries database on its own that can work without the rest of the stack. We'll try to improve the documentation. If you're curious, feel free to check it out and report anything you miss! The speed bump Obviously, if it was a bug in Gnocchi that we have hit, it would have been quick to fix. However, we found a nasty bug in Swift caused by the evil monkey-patching of Eventlet (once again) blended with a mixed usage of native threads and Eventlet threads in Swift. Shake all of that, and you got yourself pretty race conditions when using the Keystone middleware authentication. In the meantime, we disabled Swift multi-threading by using mod_wsgi instead of Eventlet in devstack. New features So what's new in this new shiny release? A few interesting things: And that's all we did in the last couple of months. We have a lot of things on the roadmap that are pretty exciting, and I'll sure talk about them in the next weeks.

2 November 2015

Julien Danjou: OpenStack Summit Mitaka from a Telemetry point of view

Last week I was in Tokyo, Japan for the OpenStack Summit, discussing the new Mitaka version that will be released in 6 months. I've attended the summit mainly to discuss and follow-up new developments on Ceilometer, Gnocchi, Aodh and Oslo. It has been a pretty good week and we were able to discuss and plan a few interesting things. Below are what I found remarkable during this summit concerning those projects. Distributed lock manager
I did not attend this session, but I need to write something about it. See, when working in a distributed environment like OpenStack, it's almost obvious that sooner or later you end up needing a distributed lock mechanism. It started to be pretty obvious and a serious problem for us 2 years ago in Ceilometer. Back then, we proposed the service-sync blueprint and talked about it during the OpenStack Icehouse Design Summit in Hong-Kong. The session at that time was a success, and in 20 minutes I convinced everyone it was the good thing to do. The night following the session, we picked a named, Tooz, to name this new library. It was the first time I met Joshua Harlow, which became one of the biggest Tooz contributor since then. For the following months, we tried to move the lines in OpenStack. It was very hard to convince people that it was the solution to their problem. Most of the time, they did not seem to grasp the entirety of what was at stake. This time, it seems that we managed to convince everyone that a DLM is indeed needed. Joshua wrote an extensive specification called Chronicle of a DLM, which ended up being discussed and somehow adopted during that session in Tokyo. So yes, Tooz will be the weapon of choice for OpenStack. It will avoid a hard requirement on any DLM solution directly. The best driver right now is the ZooKeeper one, but it'll still be possible for operators to use e.g. Redis. This is a great achievement for us, after spending years trying to fix features such as the Nova service group subsystem and seeing our proposals postponed forever. (If you want to know more, LWN.net has a great article about that session.) Telemetry team name
With the new projects launched this last year, Aodh & Gnocchi, in parallel of the old Ceilometer, plus the change from programs to Big Tent in OpenSack, the team is having an identity issue. Being referred to as the "Ceilometer team" is not really accurate, as some of us only work on Aodh or on Gnocchi. So after discussing that, I proposed to rename the team to Telemetry instead. We'll see how it goes. Alarms
The first session was about alarms and the Aodh project. It turns out that the project is in pretty good shape, but probably need some more love, which I hope I'll be able to provide in the next months. The need for a new aodhclient based on the technologies we recently used building gnocchiclient has been reasserted, so we might end up working on that pretty soon. The Tempest support also needs some improvement, and we have a plan to enhance that. Data visualisation
We got David Lyle in this session, the Project Technical Leader for Horizon. It was an interesting discussion. It used to be technically challenging to draw charts from the data Ceilometer collects, but it's now very easy with Gnocchi and its API. While the technical side is resolved, the more political and user experience side of was to draw and how was discussed at length. We don't want to make people think that Ceilometer and Gnocchi are a full monitoring solution, so there's some precaution to take. Other than that, it would be pretty cool to have view of the data in Horizon. Rolling upgrade
It turns out that Ceilometer has an architecture that makes it easy to have rolling upgrade. We just need to write a proper documentation explaining how to do it and in which order the services should be upgraded. Ceilometer splitting
The split of the alarm feature of Ceilometer in its own project Aodh in the last cycle was a great success for the whole team. We want to split other pieces of Ceilometer, as they make sense on their own, makes it easier to manage. They are also some projects that want to use them without the whole stack, so that's a good idea to make it happen. CloudKitty & Gnocchi
I attended the 2 sessions that were allocated to CloudKitty. It was pretty interesting as they want to simplify their architecture and leverage what Gnocchi provides. I proposed my view of the project architecture and how they could leverage the more of Gnocchi to retrieve and store data. They want to go in that direction though it's a large amount of work and refactoring on their side, so it'll take time. We also need to enhance the support of extension for new resources in Gnocchi, and that's something I hope I'll work on in the next months. Overall, this summit was pretty good and I got a tremendous amount of good feedback on Gnocchi. I again managed to get enough ideas and tasks to tackle for the next 6 months. It really looks interesting to see where the whole team will go from that. Stay tuned!

13 October 2015

Julien Danjou: Benchmarking Gnocchi for fun & profit

We got pretty good feedback on Gnocchi so far, even if we only had little. Recently, in order to have a better feeling of where we were at, we wanted to know how fast (or slow) Gnocchi was. The early benchmarks that some of the Mirantis engineers ran last year showed pretty good signs. But a year later, it was time to get real numbers and have a good understanding of Gnocchi capacity. Benchmark tools The first thing I realized when starting that process, is that we were lacking of tools to run benchmarks. Therefore I started to write some benchmark tools in python-gnocchiclient, which provides a command line tool to interrogate Gnocchi. I added a few basic commands to measure metric performance, such as:
$ gnocchi benchmark metric create -w 48 -n 10000 -a low
+----------------------+------------------+
Field Value
+----------------------+------------------+
client workers 48
create executed 10000
create failures 0
create failures rate 0.00 %
create runtime 8.80 seconds
create speed 1136.96 create/s
delete executed 10000
delete failures 0
delete failures rate 0.00 %
delete runtime 39.56 seconds
delete speed 252.75 delete/s
+----------------------+------------------+

The command line tool supports the --verbose switch to have detailed progress report on the benchmark progression. So far it supports metric operations only, but that's the most interesting part of Gnocchi. Spinning up some hardware I got a couple of bare metal servers to test Gnocchi on. I dedicated the first one to Gnocchi, and used the second one as the benchmark client, plugged on the same network. Each server is made of 2 Intel Xeon E5-2609 v3 (12 cores in total) and 32 GB of RAM. That provides a lot of CPU to handle requests in parallel. Then I simply performed a basic RHEL 7 installation and ran devstack to spin up an installation of Gnocchi based on the master branch, disabling all of the others OpenStack components. I then tweaked the Apache httpd configuration to use the worker MPM and increased the maximum number of clients that can sent request simultaneously. I configured Gnocchi to use the PostsgreSQL indexer, as it's the recommended one, and the file storage driver, based on Carbonara (Gnocchi own storage engine). That means files were stored locally rather than in Ceph or Swift. Using the file driver is less scalable (you have to run on only one node or uses a technology like NFS to share the files), but it was good enough for this benchmark and to have some numbers and profiling the beast. The OpenStack Keystone authentication middleware was not enabled in this setup, as it would add some delay validating the authentication token. Metric CRUD operations Metric creation is pretty fast. I managed to attain 1500 metric/s created pretty easily. Deletion is now asynchronous, which means it's faster than in Gnocchi 1.2, but it's still slower than creation: 300 metric/s can be deleted. That does not sound like a huge issue since metric deletion is actually barely used in production. Retrieving metric information is also pretty fast and goes up to 800 metric/s. It'd be easy to achieve very higher throughput for this one, as it'd be easy to cache, but we didn't feel the need to implement it so far. Another important thing is that all of these numbers are constant and barely depends on the number of the metric already managed by Gnocchi.
Operation Details Rate
Create metric Created 100k metrics in 77 seconds 1300 metric/s
Delete metric Deleted 100k metrics in 190 seconds 524 metric/s
Show metric Show a metric 100k times in 149 seconds 670 metric/s
Sending and getting measures Pushing measures into metrics is one of the hottest topic. Starting with Gnocchi 1.1, the measures pushed are treated asynchronously, which makes it much faster to push new measures. Getting new numbers on that feature was pretty interesting. The number of metric per second you can push depends on the batch size, meaning the number of actual measurements you send per call. The naive approach is to push 1 measure per call, and in that case, Gnocchi is able to handle around 600 measures/s. With a batch containing 100 measures, the number of calls per second goes down to 450, but since you push 100 measures each time, that means 45k measures per second pushed into Gnocchi! I've pushed the test further, inspired by the recent blog post of InfluxDB claiming to achieve 300k points per second with their new engine. I ran the same benchmark on the hardware I had, which is roughly two times smaller than the one they used. I achieved to push Gnocchi to a little more than 120k measurement per second. If I had same hardware as they used, I could interpolate the results to achieve almost 250k measures/s pushed. Obviously, you can't strictly compare Gnocchi and InfluxDB since they are not doing exactly the same thing, but it still looks way better than what I expected. Using smaller batch sizes of 1k or 2k improve the throughput further to around 125k measures/s.
Operation Details Rate
Push metric 5k Push 5M measures with batch of 5k measures in 40 seconds 122k measures/s
Push metric 4k Push 5M measures with batch of 4k measures in 40 seconds 125k measures/s
Push metric 3k Push 5M measures with batch of 3k measures in 40 seconds 123k measures/s
Push metric 2k Push 5M measures with batch of 2k measures in 41 seconds 121k measures/s
Push metric 1k Push 5M measures with batch of 1k measures in 44 seconds 113k measures/s
Push metric 500 Push 5M measures with batch of 500 measures in 51 seconds 98k measures/s
Push metric 100 Push 5M measures with batch of 100 measures in 112 seconds 45k measures/s
Push metric 10 Push 5M measures with batch of 10 measures in 852 seconds 6k measures/s
Push metric 1 Push 500k measures with batch of 1 measure in 800 seconds 624 measures/s
Get measures Push 43k measures of 1 metric 260k measures/s
What about getting measures? Well, it's actually pretty fast too. Retrieving a metric with 1 month of data with 1 minute interval (that's 43k points) takes less than 2 second. Though it's actually slower than what I expected. The reason seems to be that the JSON is 2 MB big and encoding it takes a lot of time for Python. I'll investigate that. Another point I discovered, is that by default Gnocchi returns all the datapoints for each granularities available for the asked period, which might double the size of the returned data for nothing if you don't need it. It'll be easy to add an option to the API to only retrieve what you need though! Once benchmarked, that meant I was able to retrieve 6 metric/s per second, which translates to around 260k measures/s. Metricd speed New measures that are pushed into Gnocchi are processed asynchronously by the gnocchi-metricd daemon. When doing the benchmarks above, I ran into a very interesting issue: sending 10k measures on a metric would make gnocchi-metricd uses up to 2 GB RAM and 120 % CPU for more than 10 minutes. After further investigation, I found that the naive approach we used to resample datapoints in Carbonara using Pandas was causing that. I reported a bug on Pandas and the upstream author was kind enough to provide a nice workaround, that I sent as a pull request to Pandas documentation. I wrote a fix for Gnocchi based on that, and started using it. Computing the standard aggregation methods set (std, count, 95pct, min, max, sum, median, mean) for 10k batches of 1 measure (worst case scenario) for one metric with 10k measures now takes only 20 seconds and uses 100 MB of RAM 45 faster. That means that in normal operations, where only a few new measures are processed, the operation of updating a metric only takes a few milliseconds. Awesome! Comparison with Ceilometer For comparison sake, I've quickly run some read operations benchmark in Ceilometer. I've fed it with one month of samples for 100 instances polled every minute. That represents roughly 4.3M samples injected, and that took a while almost 1 hour whereas it would have taken less than a minute in Gnocchi. Then I tried to retrieve some statistics in the same way that we provide them in Gnocchi, which mean aggregating them over a period of 60 seconds over a month.
Operation Details Rate
Read metric SQL Read measures for 1 metric 2min 58s
Read metric MongoDB Read measures for 1 metric 28s
Read metric Gnocchi Read measures for 1 metric 2s
Obviously, Ceilometer is very slow. It has to look into 4M of samples to compute and return the result, which takes a lot of time. Whereas Gnocchi just has to fetch a file and pass it over. That also means that the more samples you have (so the more time you collect data and the more resources you have), slower Ceilometer will become. This is not a problem with Gnocchi, as I emphasized when I started designing it. Most Gnocchi operations are O(log R) where R is the number of metrics or resources, whereas most Ceilometer operations are O(log S) where S is the number of samples (measures). Since is R millions of time smaller than S, Gnocchi gets to be much faster. And what's even more interesting, is that Gnocchi is entirely scalable horizontally. Adding more Gnocchi servers (for the API and its background processing worker metricd) will multiply Gnocchi performances by the number of servers added. Improvements There are several things to improve in Gnocchi, such as splitting Carbonara archives to make them more efficient, especially from drivers such as Ceph and Swift. It's already on my plate, and I'm looking forwarding to working on that! And if you have any questions, feel free to shoot them in the comment section.

5 October 2015

Julien Danjou: Gnocchi talk at OpenStack Paris Meetup #16

Last week, I've been invited to the OpenStack Paris meetup #16, whose subject was about metrics in OpenStack. Last time I spoke at this meetup was back in 2012, during the OpenStack Paris meetup #2. A very long time ago!
I talked for half an hour about Gnocchi, the OpenStack project I've been running for 18 months now. I started by explaining the story behind the project and why we needed to build it. Ceilometer has an interesting history and had a curious roadmap these last year, and I summarized that briefly. Then I talk about how Gnocchi works and what it offers to users and operators. The slides where full of JSON, but I imagine it offered a interesting view of what the API looks like and how easy it is to operate. This also allowed me to emphasize how many use cases are actually really covered and solved, contrary to what Ceilometer did so far. The talk has been well received and I got a few interesting questions at the end. The video of the talk (in French) and my slides are available on my talk page and below. I hope you'll enjoy it.

17 September 2015

Julien Danjou: My interview in le Journal du Hacker

A few days ago, the French equivalent of Hacker News, called "Le Journal du Hacker", interviewed me about my work on OpenStack, my job at Red Hat and my self-published book The Hacker's Guide to Python. I've spent some time translating it into English so you can read it if you don't understand French! I hope you'll enjoy it.
Hi Julien, and thanks for participating in this interview for the Journal du Hacker. For our readers who don't know you, can you introduce you briefly?
You're welcome! My name is Julien, I'm 31 years old, and I live in Paris. I now have been developing free software for around fifteen years. I had the pleasure to work (among other things) on Debian, Emacs and awesome these last years, and more recently on OpenStack. Since a few months now, I work at Red Hat, as a Principal Software Engineer on OpenStack. I am in charge of doing upstream development for that cloud-computing platform, mainly around the Ceilometer, Aodh and Gnocchi projects.
Being myself a system architect, I follow your work in OpenStack since a while. It's uncommon to have the point of view of someone as implied as you are. Can you give us a summary of the state of the project, and then detail your activities in this project?
The OpenStack project has grown and changed a lot since I started 4 years ago. It started as a few projects providing the basics, like Nova (compute), Swift (object storage), Cinder (volume), Keystone (identity) or even Neutron (network) who are basis for a cloud-computing platform, and finally became composed of a lot more projects. For a while, the inclusion of projects was the subject of a strict review from the technical committee. But since a few months, the rules have been relaxed, and we see a lot more projects connected to cloud-computing joining us. As far as I'm concerned, I've started with a few others people the Ceilometer project in 2012, devoted to handling metrics of OpenStack platforms. Our goal is to be able to collect all the metrics and record them to analyze them later. We also have a module providing the ability to trigger actions on threshold crossing (alarm). The project grew in a monolithic way, and in a linear way for the number of contributors, during the first two years. I was the PTL (Project Technical Leader) for a year. This leader position asks for a lot of time for bureaucratic things and people management, so I decided to leave my spot in order to be able to spend more time solving the technical challenges that Ceilometer offered. I've started the Gnocchi project in 2014. The first stable version (1.0.0) was released a few months ago. It's a timeseries database offering a REST API and a strong ability to scale. It was a necessary development to solve the problems tied to the large amount of metrics created by a cloud-computing platform, where tens of thousands of virtual machines have to be metered as often as possible. This project works as a standalone deployment or with the rest of OpenStack. More recently, I've started Aodh, the result of moving out the code and features of Ceilometer related to threshold action triggering (alarming). That's the logical suite to what we started with Gnocchi. It means Ceilometer is to be split into independent modules that can work together with or without OpenStack. It seems to me that the features provided by Ceilometer, Aodh and Gnocchi can also be interesting for operators running more classical infrastructures. That's why I've pushed the projects into that direction, and also to have a more service-oriented architecture (SOA)
I'd like to stop for a moment on Ceilometer. I think that this solution was very expected, especially by the cloud-computing providers using OpenStack for billing resources sold to their customers. I remember reading a blog post where you were talking about the high-speed construction of this brick, and features that were not supposed to be there. Nowadays, with Gnocchi and Aodh, what is the quality of the brick Ceilometer and the programs it relies on?
Indeed, one of the first use-case for Ceilometer was tied to the ability to get metrics to feed a billing tool. That's now a reached goal since we have billing tools for OpenStack using Ceilometer, such as CloudKitty. However, other use-cases appeared rapidly, such as the ability to trigger alarms. This feature was necessary, for example, to implement the auto-scaling feature that Heat needed. At the time, for technical and political reasons, it was not possible to implement this feature in a new project, and the functionality ended up in Ceilometer, since it was using the metrics collected and stored by Ceilometer itself. Though, like I said, this feature is now in its own project, Aodh. The alarm feature is used since a few cycles in production, and the Aodh project brings new features on the table. It allows to trigger threshold actions and is one of the few solutions able to work at high scale with several thousands of alarms. It's impossible to make Nagios run with millions of instances to fetch metrics and triggers alarms. Ceilometer and Aodh can do that easily on a few tens of nodes automatically. On the other side, Ceilometer has been for a long time painted as slow and complicated to use, because its metrics storage system was by default using MongoDB. Clearly, the data structure model picked was not optimal for what the users were doing with the data. That's why I started Gnocchi last year, which is perfectly designed for this use case. It allows linear access time to metrics (O(1) complexity) and fast access time to the resources data via an index. Today, with 3 projects having their own perimeter of features defined and which can work together Ceilometer, Aodh and Gnocchi finally erased the biggest problems and defects of the initial project.
To end with OpenStack, one last question. You're a Python developer for a long time and a fervent user of software testing and test-driven development. Several of your blogs posts point how important their usage are. Can you tell us more about the usage of tests in OpenStack, and the test prerequisites to contribute to OpenStack?
I don't know any project that is as tested on every layer as OpenStack is. At the start of the project, there was a vague test coverage, made of a few unit tests. For each release, a bunch of new features were provided, and you had to keep your fingers crossed to have them working. That's already almost unacceptable. But the big issue was that there was also a lot of regressions, et things that were working were not anymore. It was often corner cases that developers forgot about that stopped working. Then the project decided to change its policy and started to refuse all patches new features or bug fix that would not implement a minimal set of unit tests, proving the patch would work. Quickly, regressions were history, and the number of bugs largely reduced months after months. Then came the functional tests, with the Tempest project, which runs a test battery on a complete OpenStack deployment. OpenStack now possesses a complete test infrastructure, with operators hired full-time to maintain them. The developers have to write the test, and the operators maintain an architecture based on Gerrit, Zuul, and Jenkins, which runs the test battery of each project for each patch sent. Indeed, for each version of a patch sent, a full OpenStack is deployed into a virtual machine, and a battery of thousands of unit and functional tests is run to check that no regressions are possible. To contribute to OpenStack, you need to know how to write a unit test the policy on functional tests is laxer. The tools used are standard Python tools, unittest for the framework and tox to run a virtual environment (venv) and run them. It's also possible to use DevStack to deploy an OpenStack platform on a virtual machine and run functional tests. However, since the project infrastructure also do that when a patch is submitted, it's not mandatory to do that yourself locally.
The tools and tests you write for OpenStack are written in Python, a language which is very popular today. You seem to like it more than you have to, since you wrote a book about it, The Hacker's Guide to Python, that I really enjoyed. Can you explain what brought you to Python, the main strong points you attribute to this language (quickly) and how you went from developer to author?
I stumbled upon Python by chance, around 2005. I don't remember how I hear about it, but I bought a first book to discover it and started toying with that language. At that time, I didn't find any project to contribute to or to start. My first project with Python was rebuildd for Debian in 2007, a bit later. I like Python for its simplicity, its object orientation rather clean, its easiness to be deployed and its rich open source ecosystem. Once you get the basics, it's very easy to evolve and to use it for anything, because the ecosystem makes it easy to find libraries to solve any kind of problem. I became an author by chance, writing blog posts from time to time about Python. I finally realized that after a few years studying Python internals (CPython), I learned a lot of things. While writing a post about the differences between method types in Python which is still one of the most read post on my blog I realized that a lot of things that seemed obvious to me where not for other developers. I wrote that initial post after thousands of hours spent doing code reviews on OpenStack. I, therefore, decided to note all the developers pain points and to write a book about that. A compilation of what years of experience taught me and taught to the other developers I decided to interview in the book.
I've been very interested by the publication of your book, for the subject itself, but also the process you chose. You self-published the book, which seems very relevant nowadays. Is that a choice from the start? Did you look for an editor? Can you tell use more about that?
I've been lucky to find out about others self-published authors, such as Nathan Barry who even wrote a book on that subject, called Authority. That's what convinced me it was possible and gave me hints for that project. I've started to write in August 2013, and I ran the firs interviews with other developers at that time. I started to write the table of contents and then filled the pages with what I knew and what I wanted to share. I manage to finish the book around January 2014. The proof-reading took more time than I expected, so the book was only released in March 2014. I wrote a complete report on that on my blog, where I explain the full process in detail, from writing to launching. I did not look for editors though I've been proposed some. The idea of self-publishing really convince me, so I decided to go on my own, and I have no regret. It's true that you have to wear two hats at the same time and handle a lot more things, but with a minimal audience and some help from the Internet, anything's possible! I've been reached by two editors since then, a Chinese and Korean one. I gave them rights to translate and publish the books in their countries, so you can buy the Chinese and Korean version of the first edition of the book out there. Seeing how successful it was, I decided to launch a second edition in Mai 2015, and it's likely that a third edition will be released in 2016.
Nowadays, you work for Red Hat, a company that represents the success of using Free Software as a commercial business model. This company fascinates a lot in our community. What can you say about your employer from your point of view?
It only has been a year since I joined Red Hat (when they bought eNovance), so my experience is quite recent. Though, Red Hat is really a special company on every level. It's hard to see from the outside how open it is, and how it works. It's really close to and it really looks like an open source project. For more details, you should read The Open Organization, a book wrote by Jim Whitehurst (CEO of Red Hat), which he just published. It describes perfectly how Red Hat works. To summarize, meritocracy and the lack of organization in silos is what makes Red Hat a strong organization and puts them as one of the most innovative company. In the end, I'm lucky enough to be autonomous for the project I work on with my team around OpenStack, and I can spend 100% working upstream and enhance the Python ecosystem.

14 September 2015

Julien Danjou: Visualize your OpenStack cloud: Gnocchi & Grafana

We've been hard working with the Gnocchi team these last months to store your metrics, and I guess it's time to show off a bit. So far Gnocchi offers scalable metric storage and resource indexation, especially for OpenStack cloud but not only, we're generic. It's cool to store metrics, but it can be even better to have a way to visualize them! Prototyping We very soon started to build a little HTML interface. Being REST-friendly guys, we enabled it on the same endpoints that were being used to retrieve information and measures about metric, sending back text/html instead of application/json if you were requesting those pages from a Web browser. But let's face it: we are back-end developers, we suck at any kind front-end development. CSS, HTML, JavaScript? Bwah! So what we built was a starting point, hoping some magical Web developer would jump in and finish the job. Obviously it never happened. Ok, so what's out there? It turns out there are back-end agnostic solutions out there, and we decided to pick Grafana. Grafana is a complete graphing dashboard solution that can be plugged on top of any back-end. It already supports timeseries databases such as Graphite, InfluxDB and OpenTSDB. That was largely enough for that my fellow developer Mehdi Abaakouk to jump in and start writing a Gnocchi plugin for Grafana! Consequently, there is now a basic but solid and working back-end for Grafana that lies in the grafana-plugins repository.
With that plugin, you can graph anything that is stored in Gnocchi, from raw metrics to metrics tied to resources. You can use templating, but no annotation yet. The back-end supports Gnocchi with or without Keystone involved, and any type of authentication (basic auth or Keystone token). So yes, it even works if you're not running Gnocchi with the rest of OpenStack.
It also supports advanced queries, so you can search for resources based on some criterion and graphs their metrics. I want to try it! If you want to deploy it, all you need to do is to install Grafana and its plugins, and create a new datasource pointing to Gnocchi. It is that simple. There's some CORS middleware configuration involved if you're planning on using Keystone authentication, but it's pretty straightforward just set the cors.allowed_origin option to the URL of your Grafana dashboard. We added support of Grafana directly in Gnocchi devstack plugin. If you're running DevStack you can follow the instructions which are basically adding the line enable_service gnocchi-grafana. Moving to Grafana core Mehdi just opened a pull request a few days ago to merge the plugin into Grafana core. It's actually one of the most unit-tested plugin in Grafana so far, so it should be on a good path to be merged in the future and have support of Gnocchi directly into Grafana without any plugin involved.

4 September 2015

Julien Danjou: Data validation in Python with voluptuous

Continuing my post series on the tools I use these days in Python, this time I would like to talk about a library I really like, named voluptuous. It's no secret that most of the time, when a program receives data from the outside, it's a big deal to handle it. Indeed, most of the time your program has no guarantee that the stream is valid and that it contains what is expected. The robustness principle says you should be liberal in what you accept, though that is not always a good idea neither. Whatever policy you chose, you need to process those data and implement a policy that will work lax or not. That means that the program need to look into the data received, check that it finds everything it needs, complete what might be missing (e.g. set some default), transform some data, and maybe reject those data in the end.
Data validation The first step is to validate the data, which means checking all the fields are there and all the types are right or understandable (parseable). Voluptuous provides a single interface for all that called a Schema.
>>> from voluptuous import Schema
>>> s = Schema(
... 'q': str,
... 'per_page': int,
... 'page': int,
... )
>>> s( "q": "hello" )
'q': 'hello'
>>> s( "q": "hello", "page": "world" )
voluptuous.MultipleInvalid: expected int for dictionary value @ data['page']
>>> s( "q": "hello", "unknown": "key" )
voluptuous.MultipleInvalid: extra keys not allowed @ data['unknown']

The argument to voluptuous.Schema should be the data structure that you expect. Voluptuous accepts any kind of data structure, so it could also be a simple string or an array of dict of array of integer. You get it. Here it's a dict with a few keys that if present should be validated as certain types. By default, Voluptuous does not raise an error if some keys are missing. However, it is invalid to have extra keys in a dict by default. If you want to allow extra keys, it is possible to specify it.
>>> from voluptuous import Schema
>>> s = Schema( "foo": str , extra=True)
>>> s( "bar": 2 )
"bar": 2

It is also possible to make some keys mandatory.
>>> from voluptuous import Schema, Required
>>> s = Schema( Required("foo"): str )
>>> s( )
voluptuous.MultipleInvalid: required key not provided @ data['foo']

You can create custom data type very easily. Voluptuous data types are actually just functions that are called with one argument, the value, and that should either return the value or raise an Invalid or ValueError exception.
>>> from voluptuous import Schema, Invalid
>>> def StringWithLength5(value):
... if isinstance(value, str) and len(value) == 5:
... return value
... raise Invalid("Not a string with 5 chars")
...
>>> s = Schema(StringWithLength5)
>>> s("hello")
'hello'
>>> s("hello world")
voluptuous.MultipleInvalid: Not a string with 5 chars

Most of the time though, there is no need to create your own data types. Voluptuous provides logical operators that can, combined with a few others provided primitives such as voluptuous.Length or voluptuous.Range, create a large range of validation scheme.
>>> from voluptuous import Schema, Length, All
>>> s = Schema(All(str, Length(min=3, max=5)))
>>> s("hello")
"hello"
>>> s("hello world")
voluptuous.MultipleInvalid: length of value must be at most 5

The voluptuous documentation has a good set of examples that you can check to have a good overview of what you can do. Data transformation
What's important to remember, is that each data type that you use is a function that is called and returns a value, if the value is considered valid. That value returned is what is actually used and returned after the schema validation:
>>> import uuid
>>> from voluptuous import Schema
>>> def UUID(value):
... return uuid.UUID(value)
...
>>> s = Schema( "foo": UUID )
>>> data_converted = s( "foo": "uuid?" )
voluptuous.MultipleInvalid: not a valid value for dictionary value @ data['foo']
>>> data_converted = s( "foo": "8B7BA51C-DFF5-45DD-B28C-6911A2317D1D" )
>>> data_converted
'foo': UUID('8b7ba51c-dff5-45dd-b28c-6911a2317d1d')

By defining a custom UUID function that converts a value to a UUID, the schema converts the string passed in the data to a Python UUID object validating the format at the same time. Note a little trick here: it's not possible to use directly uuid.UUID in the schema, otherwise Voluptuous would check that the data is actually an instance of uuid.UUID:
>>> from voluptuous import Schema
>>> s = Schema( "foo": uuid.UUID )
>>> s( "foo": "8B7BA51C-DFF5-45DD-B28C-6911A2317D1D" )
voluptuous.MultipleInvalid: expected UUID for dictionary value @ data['foo']
>>> s( "foo": uuid.uuid4() )
'foo': UUID('60b6d6c4-e719-47a7-8e2e-b4a4a30631ed')

And that's not what is wanted here. That mechanism is really neat to transform, for example, strings to timestamps.
>>> import datetime
>>> from voluptuous import Schema
>>> def Timestamp(value):
... return datetime.datetime.strptime(value, "%Y-%m-%dT%H:%M:%S")
...
>>> s = Schema( "foo": Timestamp )
>>> s( "foo": '2015-03-03T12:12:12' )
'foo': datetime.datetime(2015, 3, 3, 12, 12, 12)
>>> s( "foo": '2015-03-03T12:12' )
voluptuous.MultipleInvalid: not a valid value for dictionary value @ data['foo']

Recursive schemas So far, Voluptuous has one limitation so far: the ability to have recursive schemas. The simplest way to circumvent it is by using another function as an indirection.
>>> from voluptuous import Schema, Any
>>> def _MySchema(value):
... return MySchema(value)
...
>>> from voluptuous import Any
>>> MySchema = Schema( "foo": Any("bar", _MySchema) )
>>> MySchema( "foo": "foo": "bar" )
'foo': 'foo': 'bar'
>>> MySchema( "foo": "foo": "baz" )
voluptuous.MultipleInvalid: not a valid value for dictionary value @ data['foo']['foo']

Usage in REST API
I started to use Voluptuous to validate data in a the REST API provided by Gnocchi. So far it has been a really good tool, and we've been able to create a complete REST API that is very easy to validate on the server side. I would definitely recommend it for that. It blends with any Web framework easily. One of the upside compared to solution like JSON Schema, is the ability to create or re-use your own custom data types while converting values at validation time. It is also very Pythonic, and extensible it's pretty great to use for all of that. It's also not tied to any serialization format. On the other hand, JSON Schema is language agnostic and is serializable itself as JSON. That makes it easy to be exported and provided to a consumer so it can understand the API and validate the data potentially on its side.

Next.