The Python programming language is one of the most popular and in huge demand. It is free, has a large community, is intended for the development of projects of varying complexity, is easy to learn, and opens up great opportunities for programmers. To work comfortably with it, you need special Python tools, which are able to simplify your work. We have selected the best Python tools that will be relevant in 2021.
Popular Python Tools 2021Python tools make life much easier for any developer and provide ample opportunities for creating effective applications or sites. These solutions help to automate different processes and minimize routine tasks.In fact, their functionality varies considerably. Some are made for full-fledged complex multi-level development, while others have a simplified interface that allows you to develop individual modules and blocks. Before choosing a tool, you need to define objectives and understand goals. In this case, it will become clear what exactly to use.
MailtrapAs you may probably know, in order to send an email, you need SMTP (Simple Mail Transfer Protocol). This is because you can't just send a letter to the recipient. It needs to be sent to the server from which the recipient will download this letter using IMAP and POP3.Mailtrap provides an opportunity to send emails in python. Moreover, Mailtrap provides #rest #api to access current emails. It can be used to automate email testing, which will improve your email marketing campaigns. For example, you can check the password recovery form in the Selenium Test and immediately see if an email was sent to the correct address. Then take a new password from the email and try to enter the site with it. Cool, isn't it?
Pros
All emails are in one place.
Mailtrap provides multiple inboxes.
Shared access is present.
It is easy to set up.
RESTful API
ConsNo visible disadvantages were found.
DjangoDjango is a free and open-source full-stack framework. It is one of the most important and popular among Python developers. It helps you move from a prototype to a ready-made working solution in a short time since its main task is to automate processes and speed up work through associations and libraries. It s a great choice for a product launch.You can use Django if at least a few of the following points interest you:
There is a need to develop the server-side of the API.
You need to develop a web application.
In the course of work, many changes are made, you have to constantly deploy the application and make edits.
There are many complex tasks that are difficult to solve on your own, and you will need the help of the community.
ORM support is needed to avoid accessing the database directly.
There is a need to integrate new technologies such as machine learning.
Django is a great Python Web Framework that does its job. It is not for nothing that it is one of the most popular, and is actively used by millions of developers.
ProsDjango has quite a few advantages. It contains a large number of ready-made solutions, which greatly simplifies development. Admin panel, database migration, various forms, user authentication tools are extremely helpful. The structure is very clear and simple.A large community helps to solve almost any problem. Thanks to ORM, there is a high level of security and it is comfortable to work with databases.
ConsDespite its powerful capabilities, Django's Python Web Framework has drawbacks. It is very massive, monolithic, therefore it develops slowly. Despite the many generic modules, the development speed of Django itself is reduced.
CherryPyCherryPy is a micro-framework. It is designed to solve specific problems, capable of running the program on any operating system. CherryPy is used in the following cases:
To create an application with small code size.
There is a need to manage several servers at the same time.
You need to monitor the performance of applications.
CherryPy refers to Python Frameworks, which are designed for specific tasks. It's clear, user-friendly, and ideal for Android development.
ProsCherryPy Python tool has a friendly and understandable development environment. This is a functional and complete framework, which can be used to build good applications. The source code is open, so the platform is completely free for developers, and the community, although not too large, is very responsive, and always helps to solve problems.
ConsThere are not so many cons to this Python tool. It is not capable of performing complex tasks and functions, it is intended more for specific solutions, for example, for the development of certain plugins or modules.
PyramidPython Pyramid tool is designed for programming complex objects and solving multifunctional problems. It is used by professional programmers and is traditionally used for identification and routing. It is aimed at a wide audience and is capable of developing API prototypes.It is used in the following cases:
You need problem indicator tools to make timely adjustments and edits.
You use several programming languages at once;
You work with reporting and financial calculations, forecasting;
You need to quickly create a simple application.
At the same time, the Python Web Framework Pyramid allows you to create complex applications with great functionality like a translation software.
ProsPyramid does an excellent job of developing basic applications quickly. It is quite flexible and easy to learn. In fact, the key to the success of this framework is that it is completely based on fundamental principles, using simple and basic programming techniques. It is minimalistic, but at the same time offers users a lot of freedom of action. It is able to work with both small applications and powerful multifunctional programs.
ConsIt is difficult to deviate from the basic principles. This Python tool makes the decision for you. Simple programs are very easy to implement. But to do something complex and large-scale, you have to completely immerse yourself in the study of the environment and obey it.
GrokGrok is a Python tool that works with templates. Its main task is to eliminate repetitions in the code. If the element is repeated, then the template that was already created earlier is simply applied. This greatly simplifies and speeds up the work.Grok suits developers in the following cases:
If a programmer has little experience and is not yet ready to develop his modules.
There is a need to quickly develop a simple application.
The functionality of the application is simple, straightforward, and the interface does not play a key role.
ProsThe Grok framework is a child of Zope3, which was released earlier. It has a simplified structure of work, easy installation of modules, more capabilities, and better flexibility. It is designed to develop small applications. Yes, it is not intended for complex work, but due to its functionality, it allows you to quickly implement a project.
ConsThe Grok community is not very large, as this Python tool has not gained widespread popularity. Nevertheless, it is used by Python adepts for comfortable development. It is impossible to implement complex tasks on it since the possibilities are quite limited.Grok is one of the best Python Web Frameworks. It is understandable and has enough features for comfortable development.
Web2PyWeb2Py is a Python tool that has its own IDEwhich, which includes a code editor, debugger, and deployment. It works great without the need for configuration or installation, provides a high level of data security, and is suitable for work on various platforms.Web2Py is great in the following cases:
When there is a need to develop something on different operating systems.
If there is no way to install and configure the framework.
When a high level of data security is required, for example, when developing financial applications or sales performance management tools.
If you need to carefully track bugs right during development, and not during the testing phase.
ProsWeb2Py is capable of working with different protocols, has a built-in error tracker, and has a backward compatibility feature that helps to work on the basis of previous versions of the framework. This means that code maintenance becomes much easier and cheaper. It's free, open-source, and very flexible.
ConsAmong the many Python tools, there are not many that require the latest version of the language. Web2Py is one of those and won't work on Python 3 and below. Therefore, you need to constantly monitor the updates.Web2Py does an excellent job of its tasks. It is quite simple and accessible to everyone.
BlueBreamBlueBream used to be called Zope3 before. It copes well with tasks of the medium and high level of complexity and is suitable for working on serious projects.
ProsThe BlueBream build system is quite powerful and suitable for complex tasks. You can create functional applications on it, and the principle of reuse of components makes the code easier. At the same time, the speed of development increases. The software can be scaled, and a transactional object database provides an easy path to store it. This means that queries are processed quickly and database management is simple.
ConsThis is not a very flexible framework, it is better to know clearly in advance what is required of it. In addition, it cannot withstand heavy loads. When working with 1000 users at the same time, it can crash and give errors. Therefore, it should be used to solve narrow problems.Python frameworks are often designed for specific tasks. BlueBream is one of these and is suitable for applications where database management plays a key role.
ConclusionPython tools come in different forms and have vastly different capabilities. There are quite a few of them, but in 2021 these will be the most popular and in demand. Experienced programmers always choose several development tools for their comfortable work.
I started to write C 25 years ago now, with many different tools over the year. As many open source developers, I spent most of my life working with the GNU tools out there.As I've been using an Apple computer over the last years, I had to adapt to this environment and learn the tricks of the trade. Here are some of my notes so a search engine can index them and I'll be able to find them later.
Debugger: lldbI was used to gdb for most of years doing C. I never managed to install gdb correctly on macOS as it needs certificates, authorization, you name it, to work properly.macOS provides a native debugger named lldb, which really looks like gdb to me it runs in a terminal with a prompt.I had to learn the few commands I mostly use, which are:
lldb -- myprogram -options to run the program with options
r to run the program
bt or bt N to get a backtrace of the latest N frames
f N to select frame N
p V to print some variable value or memory address
Those commands cover 99% of my use case with a debugger when writing C, so once I lost my old gdb habits, I was good to go.
Debugging Memory Overflows
On GNU/LinuxOne of my favorite tools when writing C has always been Electric Fence (and DUMA more recently). It's a library that overrides the standard memory manipulation function (e.g., malloc) and instantly makes the program crash when an out of memory error is produced, rather than corrupting the heap.Heap corruption issues are hard to debug without such tools as they can happen at any time and stay unnoticed for a while, crashing your program in a totally different location later.There's no need to compile your program with those libraries. By using the dynamic loader, you can preload them and overload the standard C library functions.
LD_PRELOAD=/usr/lib/libefence.so.0.0 my-program
Run my-program with Eletric Fence loaded on GNU/LinuxMy gdb configuration has been sprinkle with my friends efence and duma, and I would activate them from gdb easily with this configuration in ~/.gdbinit:
define efence
set environment EF_PROTECT_BELOW 0
set environment EF_ALLOW_MALLOC_0 1
set environment LD_PRELOAD /usr/lib/libefence.so.0.0
echo Enabled Electric Fence\n
end
document efence
Enable memory allocation debugging through Electric Fence (efence(3)).
See also nofence and underfence.
end
define underfence
set environment EF_PROTECT_BELOW 1
set environment EF_ALLOW_MALLOC_0 1
set environment LD_PRELOAD /usr/lib/libefence.so.0.0
echo Enabled Electric Fence for underflow detection\n
end
document underfence
Enable memory allocation debugging for underflows through Electric Fence
(efence(3)).
See also nofence and efence.
end
define nofence
unset environment LD_PRELOAD
echo Disabled Electric Fence\n
end
document nofence
Disable memory allocation debugging through Electric Fence (efence(3)).
end
define duma
set environment DUMA_PROTECT_BELOW 0
set environment DYMA_ALLOW_MALLOC_0 1
set environment LD_PRELOAD /usr/lib/libduma.so
echo Enabled DUMA\n
end
document duma
Enable memory allocation debugging through DUMA (duma(3)).
See also noduma and underduma.
end
On macOSI've been looking for equivalent features in macOS, and after many hours of research, I found out that this feature is shipped natively with libgmalloc. It works in the same way, and its features are documented by Apple.My ~/.lldbinit file now contains the following:
command alias gm _regexp-env DYLD_INSERT_LIBRARIES=/usr/lib/libgmalloc.dylib
This command alias allows enabling gmalloc by just typing gm at the lldb prompt and then run the program again to see if it crashes with gmalloc enabled.
Debugging CPythonIt's not a mystery that I spend a lot of time writing Python code that's the main reason I've been doing C lately.When playing with CPython, it can be useful to, e.g., dump the content of PyObject structs on the heap or get the Python backtrace.I've been using cpython-lldb for this with great success. It adds a few bells and whistles when debugging CPython or extensions inside lldb. For example, the alias py-bt is handy to get the Python traceback of your calls rather than a bunch of cryptic C frames.Now, you should be ready to debug your nasty issues and memory problems on macOS efficiently!
Fifteen years have passed since I started my career in IT which is quite some time. I've been playing with computers for 25 years now, which makes me quite knowledgeable about the field, for sure.However, while I was fully prepared to bargain with computers, I was not prepared to do so with humans. The whole career management thing was unknown to me. I had no useful skills to navigate within the enterprise organization. I had to learn the ropes the hard way, failing along the way. It hurt.Almost ten years ago, I had the chance to meet a new colleague Alexis Monville. Alexis was a team facilitator, and I started to work with him on many non-technical levels. He taught me a lot about agility and team organization. Working on this set of new skills changed how I envisioned my work and how I fit into the company.Alexis MonvilleI worked on those aspects of my job because I decided to be in charge of my career rather than keeping things boring. That was one of the best decisions I ever made. Growing the social aspect of my profession allowed me to develop and find aspiring jobs and missions.Getting to that point takes a lot of time and effort, and it's pretty hard to do it alone. My friend Alexis wrote an excellent book titled I am a Software Engineer and I am in Charge. I'm proud to have been the first reviewer the book before it was released a few weeks ago.Many developers out there are stuck in a place where they are not excited by their colleagues' work and whose managers do not appropriately recognize their achievement. It would be best for them if they did something about that.The book!This book is an excellent piece for engineers who wants to break the cycle of frustration. It covers many situations I encountered across my professional life those last years, giving good insights into how to solve them.To paraphrase Alexis, the answers to your career management problems are not on StackOverflow they're not technical issues. However, you can still solve them with the right tools. That's where I am a Software Engineer and I am in Charge shines. It gives you leads, solutions, and exercise to get out of this kind of situation. It helps increase your impact and satisfaction at work.I love this book, and I wish I had access to it years ago. Developing technical leadership is not easy and requires a mindset shift. Having a way to bootstrap yourself with this is a luxury.If you're a software engineer at the beginning of your career or struggling with your current professional situation, I profoundly recommend reading this book! You'll get a fast track on your career, for sure.
Earlier this year, I was supposed to participate to dotPy, a one-day Python conference happening in Paris. This event has unfortunately been cancelled due to the COVID-19 pandemic.Both Victor Stinner and me were supposed to attend that event. Victor had prepared a presentation about Python performances, while I was planning on talking about profiling.Rather than being completely discouraged, Victor and I sat down (remotely) with Anne Laure from Behind the Code (a blog ran by Welcome to the Jungle, the organizers of the dotPy conference).We discuss Python performance, profiling, speed, projects, problems, analysis, optimization and the GIL.You can read the interview here.
If you never heard of the 10x engineer myth, it's a pretty great concept. It boils down to the idea where an engineer could be 10x more efficient than a random engineer. I find this fantastically twisted.Last week, I sat and chat with Alexis Monville in Le Podcast a podcast that equips you to make positive change in your organization. We talked about that 10x Engineer myth, and from there we digressed on how to grow your career and handle the different aspects of it.This was a very interesting exchange. Alexis is actually going to publish a new book next month (May 2020) entitled I am a Software Engineer and I am in charge.Lucky me, this week, I had the chance to be able to read the book before everybody else which means I actually read after our recording. I understood why Alexis said that a lot of what I was talking about during our podcast resonated with him. I send a detailed review of the book to Alexis and Michael if you're curious. I'm definitely recommending this book if you want to stop complaining about your job and start understanding how to pull the strings.I wish I had this book available 10 years ago!
It has been close to a year now that I've incorporated my new company, Mergify. I've been busy, and I barely wrote anything about it so far. Now is an excellent time to take a break and reflect a bit on what happened during those last 12 months.
What problem does Mergify solve?Mergify is a powerful automation engine for GitHub pull requests. It allows you to automate everything and especially merging. You write rules, and it handles the rest.Example of rule matching returned in GitHub checksFor example, let's say you want your pull request to be merged, e.g., once your CI passes and the pull request has been approved. You just write such a rule, and our engine merges the pull request as soon as it's ready.We also deal with more advanced use cases. For instance, we provide a merge queue so your pull requests are merged serially and tested by your CI one after another avoiding any regression in your code.Our goal is to make pull request management and automation easy. You can use your bot to trigger a rebase of your pull requests, or a backport to a different branch, just with a single comment.Some people like to make bots talk to each other.
A New AdventureMergify is the first company that I ever started. I did run some personal businesses before, created non-profit organizations, built FOSS projects but I never created a company from scratch, even less with an associate.Indeed, I've chosen to build the company with my old friend Mehdi. We've known each others for 7 years now, and have worked together all that time on different open-source projects. Having worked with each other for so long has probably been a critical factor in the success of our venture so far.I had little experience sharing the founding seats with someone, and tons of reading seemed to indicate that it would be a tough ride. Picking the right business partner(s) can be a hard task. Luckily, after working so much time together, Mehdi and I both know our strengths and weaknesses well enough to be able to circumvent them. On the other hand, we both have similar backgrounds as software engineers. That does not help to cover all the hats you need to wear when building a company. Over time, we found arrangements to cover most of those equally between us.We don't have any magical advice to give on this. As in every relationship, communication is the key, and the #1 factor of success.
Getting UsersI don't know if we got lucky, but we got users and customers pretty early. We used a few cooperative projects as guinea pigs first, and they were brave enough to try our service and give us feedback. No repository has been harmed during this first phase!Then, as soon as we managed to get our application on the GitHub Marketplace, we saw a steady number of users coming to us.This has been fantastic as it allowed us to get feedback rapidly. We set up a form asking users for feedback after they used Mergify for a couple of weeks. What we hear is that users were happy, that the documentation was confusing and that some features were buggy or missing. We planned all of those ideas as our future work in our roadmap, using the principles we described a few months ago.
If you're curious, you can read this article.We tried various strategies to get new users, but so far, organic growth has been our #1 way of onboarding new users. Like many small startups out there, we're not that good at marketing and executing strategies.We provide our service for free for open-source projects We are now powering many organizations, such as Mozilla, Amazon Web Services, Ceph and Fedora.
Working with GitHubWorking with GitHub has been complicated. When you build an application for a marketplace, your business is entirely dependent on the platform you develop for both in terms of features and quality of service.In our case, we hit quite many bugs with GitHub. Their support has mostly been fast to answer, but some significant issues are still opened months later. The truth is that the GitHub API could deserve more love and care from GitHub. For example, their GraphQL API is a work in progress for years and miss out many essential features.GitHub service status is not always green.We dealt and still deal with all those issues. It obviously impacts our operations and decreases our overall velocity. We regularly have to find new ways to sidestep GitHub limitations.You have no idea how we wished for GitHub to be open-source. The idea of not having access to their code and understand how it works is so frustrating that we publish our engine as an open-source project. That allows all of our users to see how it works and even propose enhancements.
Automate all the wayWe're a tiny startup, and we decided to bootstrap our company. We never took any funding. From the beginning, it has been clear to us that we had to think and act like we had no resources. We're built around a scarcity mindset. Every decision we make is based on the assumption that we basically are very limited in terms of money and time.We basically act like any wrong choice we do could (virtually) kill the company. We only do what is essential, we ship fast, and we automate everything.For example, we have built our whole operation about CI/CD systems, and pushing any new fix or feature in production is done in a matter of minutes. It's not uncommon for us to push a fix from our phone, just by reviewing some code or editing a file.
GrowthWe're extremely happy with our steady growth and more users using our service. We now manage close to 30k repositories and merge 15k pull requests per month for our users.That's a lot of mouse clicks saved!If you want to try Mergify yourself, it's a single click log-in using your GitHub account. Check it out!
Here is my monthly update covering what I have been doing in the free software world during July 2017 (previous month):
Updated travis.debian.net, my hosted service for projects that host their Debian packaging on GitHub to use the Travis CI continuous integration platform to test builds:
Moved the default mirror from ftp.de.debian.org to deb.debian.org. []
Create a sensible debian/changelog file if one does not exist. []
Updated django-slack, my library to easily post messages to the Slack group-messaging utility:
Merged a PR to clarify the error message when a channel could not be found. []
Reviewed and merged a suggestion to add a TestBackend. []
Added Pascal support to Louis Taylor's anyprint hack to add support for "print" statements from other languages into Python. []
Filed a PR against Julien Danjou's daiquiri Python logging helper to clarify an issue in the documentation. []
Merged a PR to Strava Enhancement Suite my Chrome extension that improves and fixes annoyances in the web interface of the Strava cycling and running tracker to remove Zwift activities with maps. []
Whilst anyone can inspect the source code of free software for malicious flaws, most software is distributed pre-compiled to end users.
The motivation behind the Reproducible Builds effort is to permit verification that no flaws have been introduced either maliciously or accidentally during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised.
(I have generously been awarded a grant from the Core Infrastructure Initiative to fund my work in this area.)
This month I:
Assisted Mattia with a draft of an extensive status update to the debian-devel-announce mailing list. There were interesting follow-up discussions on Hacker News and Reddit.
Submitted the following patches to fix reproducibility-related toolchain issues within Debian:
apt: Make the output of apt-ftparchive reproducible. Thanks to Colin Percival from Tarsnap for the initial bug report. (#869557)
gconf: Make the output of /var/lib/gconf/defaults/%gconf-tree-*.xml files reproducible. (#867848, forwarded upstream)
4:4.0.0-1 Upload new major upstream release to unstable.
4:4.0.0-2 Make /usr/bin/redis-server in the primary package a symlink to /usr/bin/redis-check-rdb in the redis-tools package to prevent duplicate debug symbols that result in a package file collision. (#868551)
4:4.0.0-3 Add -latomic to LDFLAGS to avoid a FTBFS on the mips & mipsel architectures.
4:4.0.1-1 New upstream version. Install 00-RELEASENOTES as the upstream changelog.
4:4.0.1-2 Skip non-deterministic tests that rely on timing. (#857855)
On February 4th and 5th, Debian will be attending
FOSDEM 2017 in Brussels,
Belgium; a yearly gratis event (no registration needed) run by volunteers from
the Open Source and Free Software community. It's free, and it's big: more than
600 speakers, over 600 events, in 29 rooms.
This year more than 45 current or past
Debian contributors will speak at FOSDEM:
Alexandre Viau,
Bradley M. Kuhn,
Daniel Pocock,
Guus Sliepen,
Johan Van de Wauw,
John Sullivan,
Josh Triplett,
Julien Danjou,
Keith Packard,
Martin Pitt,
Peter Van Eynde,
Richard Hartmann,
Sebastian Dr ge,
Stefano Zacchiroli
and Wouter Verhelst, among others.
Similar to previous years, the event will be hosted at Universit libre de
Bruxelles. Debian contributors and enthusiasts will be taking shifts at the
Debian stand with gadgets, T-Shirts and swag. You can find us at stand number
4 in building K, 1 B; CoreOS Linux and PostgreSQL will be our neighbours. See
https://wiki.debian.org/DebianEvents/be/2017/FOSDEM
for more details.
We are looking forward to meeting you all!
When I started contributing to OpenStack, almost five
years ago, it was a small ecosystem. There were no foundation, a handful of
projects and you could understand the code base in a few days.
Fast forward 2016, and it is a totally different beast. The project grew to
no less than 54 teams,
each team providing one or more deliverable. For example, the Nova and Swift
team each one produces one service and its client, whereas the Telemetry team
produces 3 services and 3 different clients.
In 5 years, OpenStack went to a few
IaaS projects, to
54 different teams tackling different areas related to cloud computing. Once
upon a time, OpenStack was all about starting some virtual machines on a
network, backed by images and volumes. Nowadays, it's also about orchestrating
your network deployment over containers, while managing your application
life-cycle using a database service, everything being metered and billed for.
This exponential growth has been made possible with the decision of the
OpenStack Technical Committee
to open the gates with
the project structure reform voted at the end of 2014.
This amendment suppresses the old OpenStack model of "integrated projects"
(i.e. Nova, Glance, Swift ). The big tent, as it's called, allowed OpenStack to
land new projects every month, growing from the 20 project teams of December
2014 to the 54 we have today multiplying the number of projects by 2.7 in a
little more than a year.
Amazing growth, right?
And this was clearly a good change. I sat at the Technical Committee in 2013,
when projects were trying to apply to be "integrated", after Ceilometer and
Heat were. It was painful to see how the Technical Committee was trying to
assess whether new projects should be brought in or not.
But what I notice these days, is how OpenStack is still stuck between its old
and new models. On one side, it accepted a lot of new teams, but on the other
side, many are considered as second-class citizens. Efforts are made to
continue to build an OpenStack project that does not exist anymore.
For example, there is a team trying to define what's OpenStack core, named
DefCore. That is looking to define
which projects are, somehow, actually OpenStack. This leads to weird
situations,
such as having non-DefCore projects seeing their doc rejected from installation guides.
Again,
I reiterated my proposal
to publish documentation as part of each project code to solve that dishonest
situation and put everything on a level playing field
Some cross-projects specs are also pushed without implication of all OpenStack
projects. For example, The
deprecate-cli
spec which proposes to deprecate command-line interface tools proposed by each
project had a lot of sense in the old OpenStack sense, where the goal was to
build a unified and ubiquitous cloud platform. But when you now have tens of
projects with largely different scopes, this start making less sense. Still,
this spec was merged by the OpenStack Technical Committee this cycle. Keystone
is the first project to proudly force users to rely on
openstack-client,
removing its old keystone command line tool.
I find it odd to push that specs when it's pretty clear that some projects
(e.g. Swift, Gnocchi ) have no intention to go down that path.
Unfortunately, most specs pushed by the Technical Committee are in the realm of
wishful thinking. It somehow makes sense, since only a few of the members are
actively contributing to OpenStack projects, and they can't by themselves
implement all of that magically. But OpenStack is no exception in the free
software world and remains a do-ocracy.
There is good cross-project content in OpenStack, such as
the API working group.
While the work done should probably not be OpenStack specific, there's a lot
that teams have learned by building various HTTP REST API with different
frameworks. Compiling this knowledge and offering it as a guidance to various
teams is a great help.
My fellow developer Chris Dent wrote a post about
what he would do on the Technical Committee.
In this article, he points to a lot of the shortcomings I described here, and
his confusion between OpenStack being a product or being a kit is quite
understandable. Indeed, the message broadcasted by OpenStack is still very
confusing after the big tent openness. There's no enough user experience
improvement being done.
The OpenStack Technical Committee election is opened for April 2016, and from
what I read so far, many candidates are proposing to now clean up the big tent,
kicking out projects that do not match certain criteria anymore. This is
probably a good idea, there is some inactive project laying around. But I don't
think that will be enough to solve the identity crisis that OpenStack is
experiencing.
So this is why, once again this cycle, I will throw my hat in the ring and
submit my candidacy for OpenStack Technical Committee.
A little more than 3 months after our latest minor release, here is the new
major version of Gnocchi, stamped
2.0.0. It contains a lot of new and
exciting features, and I'd like to talk about some of them to celebrate!
You may notice that this release happens in the middle of the OpenStack release
cycle. Indeed, Gnocchi does not follow that 6-months cycle, and we release
whenever our code is ready. That forces us to have a more iterative approach,
less disruptive for other projects and allow us to achieve a higher velocity.
Applying the good old mantra release early, release often.
Documentation
This version features a large documentation update. Gnocchi is still the only
OpenStack server project that implements a "no doc, no merge" policy, meaning
any code must come with the documentation addition or change included in the
patch. The full documentation is included in the source code and available
online at gnocchi.xyz.
Data split & compression
I've already covered this change extensively in
my last blog about timeseries compression.
Long story short, Gnocchi now splits timeseries archives in small chunks that
are compressed, increasing speed and decreasing data size.
Measures batching support
Gnocchi now supports batching, which allow submitting several measures for
different metric in a single request. This is especially useful in the context
where your application tends to cache metrics for a while and is able to send
them in a batch. Usage is
fully documented for the REST API.
Group by support in aggregation
One of the most demanded features was the ability to do measure aggregation no
resource, using a group by type query. This is now possible using the
new groupby parameter to aggregation queries.
Ceph backend optimization
We improved the Ceph back-end a lot. Mehdi Abaakouk wrote a new Python binding
for Ceph, called Cradox, that is going to
replace the current Python rados module in the subsequent Ceph releases.
Gnocchi makes usage of this new module to speed things up, making the Ceph
based driver really, really faster than before. We also implemented
asynchronous data deletion, which improves performance a bit.
The next step will be to run some new benchmarks
like I did a few months ago and compare with the
Gnocchi 1.3 series. Stay tuned!
The first major version of the scalable timeserie database I work on,
Gnocchi was a released a few months ago. In this first
iteration, it took a rather naive approach to data storage. We had little ideas
about if and how our distributed back-ends were going to be heavily used, so we
stuck to the code of the first proof-of-concept written a couple of years ago.
Recently we got more feedbacks from our users, ran a few
benchmarks. That gave us enough feedback to start
investigating in improving our storage strategy.
Data split
Up to Gnocchi 1.3, all data for a single metric are stored in a single gigantic
file per aggregation method (min, max, average ). This means that the
file can grow to several megabytes in size, which make it slow to manipulate.
For the next version of Gnocchi, our first work has been to rework that storage
and split the data into smaller parts.
The diagram above shows how data are organized inside Gnocchi. Until version
1.3, there would have been only one file for each aggregation methods.
In the upcoming 2.0 version, Gnocchi will split all these data into smaller
parts, where each data split is stored in a file/object. This allows to
manipulate smaller pieces of data and to increase the parallelism of the CRUD
operations on the back-end leading to large speed improvement.
In order to split timeseries into several chunks, Gnocchi defines a maximum
number of N points to keep per chunk, to limit their maximum size. It then
defines a hash function that produces a non-unique key for any timestamp. It
makes it easy to find in which chunk any timestamp should be stored or
retrieved.
Data compression
Up to Gnocchi 1.3, the data stored for each metric is simply serialized using
msgpack, a fast and small serialization format. Though,
this format does not provide any compression. That means that storing data
points needs 8 bytes for a timestamp (64 bits timestamp with nanosecond
precision) and 8 bytes for a value (64 bits double-precision floating-point),
plus some overhead (extra information and msgpack itself).
After looking around on how to compress all these measures, I stumbled upon a
paper from some Facebook engineers called about Gorilla,
their in-memory timeserie database, entitled
"Gorilla: A Fast, Scalable, In-Memory Time Series Database".
For reference, part of this encoding is also used by
InfluxDB
in its new storage engine.
The first technique I implemented is easy enough, and it's inspired from
delta-of-delta encoding. Instead of storing each timestamp for each data point,
and since all the data points are aggregated on a regular interval, we
transpose points to be the time difference divided by the interval. For
example, the suite of timestamps timestamps =
[41230, 41235, 41240, 41250, 41255] is encoded into timestamps =
[41230, 1, 1, 2, 1], interval = 5. This allows regular compression algorithms
to reduce the size of the integer list using
run-length encoding.
To actually compress the values, I tried two different algorithms:
[LZ4](https://en.wikipedia.org/wiki/LZ4_(compression_algorithm), a fast
compression/decompression algorithm
The XOR based compression scheme described in the Gorilla paper mentioned
above that
I had to implement myself.
For reference, it also exists a Go implementation in
go-tsz.
I then benchmarked these solutions:
The XOR algorithm implemented in Python is pretty slow, compared to LZ4. Truth
is that python-lz4 is fully implemented
in C, which makes it fast. I've profiled my XOR implementation in Python, to
discover that one operation took 20 % of the time:
count_lead_and_trail_zeroes, which is in charge of counting the number of
leading and trailing zeroes in a binary number.
I tried 2 Python implementations of the same algorithm (and submitted them to
my friend and Python developer
Victor Stinner by the way).
The first version using string search with .index() is 10 faster than the
second one that only do integer computation. Ah, Python As Victor explained,
each Python operation is slow and there's a lot in the second version, whereas
.index() is implemented in C and really well optimized and only needs 2
Python operations.
Finally, I ended up optimizing that code by leveraging
cffi to use directly ffsll() and
flsll(). That decreased the run-time of count_lead_and_trail_zeroes by
45 %, making the entire XOR compression code speed increased by a small 7 %.
This is not enough to catch up with LZ4 speed. At this stage, the only solution
to achieve a high-speed would probably to go with a full C implementation.
Considering the compression ratio of the different algorithms, they are pretty
much identical. The worst case scenario (random values) for LZ4 compress down
to 9 bytes per data point, whereas XOR can go down to 7.38 bytes per data
point. In general XOR encoding beats LZ4 by 15 %, except for cases where all
values are 0 or 1. However, LZ4 is faster than XOR by a factor of 4 -70
depending on cases.
That means that we'll use LZ4 for data compression in Gnocchi 2.0. It's
possible that we could achieve as fast compression/decompression algorithm, but
I don't think it's worth the effort right now it'd represent a lot of code to
write and to maintain.
Last week-end, I was in Brussels, Belgium for the FOSDEM,
one of the greatest open source developer conference. I was not sure to go
there this year (I already skipped it in 2015), but it turned out I was
requested to do a talk in the shared
Lua &
GNU Guile devroom.
As a long time Lua user and developer, and a follower of
GNU Guile for several years, the
organizer asked me to run a talk that would be a link between the two
languages.
I've entitled my talk
"How awesome ended up with Lua and not Guile"
and gave it to a room full of interested users of the awesome window manager .
We continued with a panel discussion entitled
"The future of small languages Experience of Lua and Guile"
composed of Andy Wingo, Christopher Webber, Ludovic Court s, Etiene Dalcol,
Hisham Muhammaad and myself. It was a pretty interesting discussion, where both
language shared their views on the state of their languages.
It was a bit awkward to talk about Lua & Guile whereas most of my knowledge was
year old, but it turns out many things didn't change. I hope I was able to
provide interesting hindsight to both community. Finally, it was a pretty
interesting FOSDEM to me, and it was a long time I didn't give talk here, so I
really enjoyed it. See you next year!
Writing programs is fun, but making them fast can be a pain. Python programs
are no exception to that, but the basic profiling toolchain is actually not
that complicated to use. Here, I would like to show you how you can quickly
profile and analyze your Python code to find what part of the code you should
optimize.
What's profiling?
Profiling a Python program is doing a dynamic analysis that measures the
execution time of the program and everything that compose it. That means
measuring the time spent in each of its functions. This will give you data
about where your program is spending time, and what area might be worth
optimizing.
It's a very interesting exercise. Many people focus on local optimizations,
such as determining e.g. which of the Python functions range or xrange is
going to be faster. It turns out that knowing which one is faster may never be
an issue in your program, and that the time gained by one of the functions
above might not be worth the time you spend researching that, or arguing about
it with your colleague.
Trying to blindly optimize a program without measuring where it is actually
spending its time is a useless exercise. Following your guts alone is not
always sufficient.
There are many types of profiling, as there are many things you can measure. In
this exercise, we'll focus on CPU utilization profiling, meaning the time spent
by each function executing instructions. Obviously, we could do many more kind
of profiling and optimizations, such as memory profiling which would measure
the memory used by each piece of code something I talk about in
The Hacker's Guide to Python.
cProfile
Since Python 2.5, Python provides a C module called
cProfile which has a
reasonable overhead and offers a good enough feature set. The basic usage goes
down to:
This prints out all the function called, with the time spend in each and the
number of times they have been called.
Advanced visualization with KCacheGrind
While being useful, the output format is very basic and does not make easy to
grab knowledge for complete programs.
For more advanced visualization, I leverage
KCacheGrind. If you did any C
programming and profiling these last years, you may have used it as it is
primarily designed as front-end for Valgrind generated
call-graphs.
In order to use, you need to generate a cProfile result file, then convert it
to KCacheGrind format. To do that, I use
pyprof2calltree.
And the KCacheGrind window magically appears!
Concrete case: Carbonara optimization
I was curious about the performances of
Carbonara,
the small timeserie library I wrote for
Gnocchi. I decided to do some basic profiling
to see if there was any obvious optimization to do.
In order to profile a program, you need to run it. But running the whole
program in profiling mode can generate a lot of data that you don't care
about, and adds noise to what you're trying to understand. Since Gnocchi has
thousands of unit tests and a few for Carbonara itself, I decided to profile
the code used by these unit tests, as it's a good reflection of basic features
of the library.
Note that this is a good strategy for a curious and naive first-pass profiling.
There's no way that you can make sure that the hotspots you will see in the
unit tests are the actual hotspots you will encounter in production. Therefore,
a profiling in conditions and with a scenario that mimics what's seen in
production is often a necessity if you need to push your program optimization
further and want to achieve perceivable and valuable gain.
I activated cProfile using the method described above, creating a
cProfile.Profile object around my tests (I actually
started to implement that in testtools).
I then run KCacheGrind as described above. Using KCacheGrind, I generated
the following figures.
The test I profiled here is called test_fetch and is pretty easy to
understand: it puts data in a timeserie object, and then fetch the aggregated
result. The above list shows that 88 % of the ticks are spent in set_values
(44 ticks over 50). This function is used to insert values into the timeserie,
not to fetch the values. That means that it's really slow to insert data, and
pretty fast to actually retrieve them.
Reading the rest of the list indicates that several functions share the rest of
the ticks, update, _first_block_timestamp, _truncate, _resample, etc.
Some of the functions in the list are not part of Carbonara, so there's no
point in looking to optimize them. The only thing that can be optimized is,
sometimes, the number of times they're called.
The call graph gives me a bit more insight about what's going on here. Using my
knowledge about how Carbonara works, I don't think that the whole stack on the
left for _first_block_timestamp makes much sense. This function is supposed
to find the first timestamp for an aggregate, e.g. with a timestamp of 13:34:45
and a period of 5 minutes, the function should return 13:30:00. The way it
works currently is by calling the resample function from Pandas on a
timeserie with only one element, but that seems to be very slow. Indeed,
currently this function represents 25 % of the time spent by set_values (11
ticks on 44).
Fortunately, I recently added a small function called _round_timestamp that
does exactly what _first_block_timestamp needs that without calling any
Pandas function, so no resample. So I ended up rewriting that function this
way:
And then I re-run the exact same test to compare the output of cProfile.
The list of function seems quite different this time. The number of time spend
used by set_values dropped from 88 % to 71 %.
The call stack for set_values shows that pretty well: we can't even see the
_first_block_timestamp function as it is so fast that it totally disappeared
from the display. It's now being considered insignificant by the profiler.
So we just speed up the whole insertion process of values into Carbonara by a
nice 25 % in a few minutes. Not that bad for a first naive pass, right?
Finally, Gnocchi 1.3.0 is out.
This is our final release, more or less matching the OpenStack 6 months
schedule, that concludes the Liberty development cycle.
This release was supposed to be released a few weeks earlier, but our
integration test got completely blocked for several days just the week before
the OpenStack Mitaka summit.
New website
We build a new dedicated website for Gnocchi at
gnocchi.xyz. We want to promote Gnocchi outside of the
OpenStack bubble, as it a useful timeseries database on
its own that can work without the rest of the stack. We'll try to improve the
documentation. If you're curious, feel free to check it out and report anything
you miss!
The speed bump
Obviously, if it was a bug in Gnocchi that we have hit, it would have been
quick to fix. However, we found
a nasty bug in
Swift caused by the evil monkey-patching of Eventlet (once again) blended with
a mixed usage of native threads and Eventlet threads in Swift. Shake all of
that, and you got yourself pretty race conditions when using the Keystone
middleware authentication.
In the meantime, we disabled Swift multi-threading by using mod_wsgi instead of
Eventlet in devstack.
New features
So what's new in this new shiny release? A few interesting things:
Metric deletion is now asynchronous. That's not the most used feature in the
REST API weirdly people do not often delete metrics but it's now way
faster and reliable by being asynchronous. Metricd is now in charge of
cleaning up things up.
Speed improvement. We are now confident to be even more faster than in the
latest benchmarks I run (around 1.5-2 faster), which
makes Gnocchi really fast with its native storage back-ends. We profiled
and optimized Carbonara and the REST API data validation.
Improve metricd status report. It now reports the size of the backlog of
the whole cluster both in its log and via the REST API. Easy monitoring!
Ceph drivers enhancement. We had people testing the Ceph drivers in
production, so we made a few changes and fixes to it to make it more solid.
And that's all we did in the last couple of months. We have a lot of things on
the roadmap that are pretty exciting, and I'll sure talk about them in the next
weeks.
Last week I was in Tokyo, Japan for the
OpenStack Summit, discussing
the new Mitaka version that will be released in 6 months.
I've attended the summit mainly to discuss and follow-up new developments on
Ceilometer,
Gnocchi, Aodh and
Oslo. It has been a pretty good week and we were able to discuss and plan a few
interesting things. Below are what I found remarkable during this summit
concerning those projects.
Distributed lock manager
I did not attend this session, but I need to write something about it.
See, when working in a distributed environment like OpenStack, it's almost
obvious that sooner or later you end up needing a distributed lock mechanism.
It started to be pretty obvious and a serious problem for us 2 years ago in
Ceilometer. Back then, we proposed the
service-sync
blueprint and talked about it during the OpenStack Icehouse Design Summit in
Hong-Kong. The session at that time was a success, and in 20 minutes I
convinced everyone it was the good thing to do. The night following the
session, we picked a named, Tooz, to name this new library. It was the first
time I met Joshua Harlow, which became one of the biggest Tooz contributor
since then.
For the following months, we tried to move the lines in OpenStack. It was very
hard to convince people that it was the solution to their problem. Most of the
time, they did not seem to grasp the entirety of what was at stake.
This time, it seems that we managed to convince everyone that a DLM is indeed
needed. Joshua wrote an extensive specification called
Chronicle of a DLM, which ended up
being discussed and somehow adopted during that session in Tokyo.
So yes, Tooz will be the weapon of choice for OpenStack. It will avoid a hard
requirement on any DLM solution directly. The best driver right now is the
ZooKeeper one, but it'll still be possible for
operators to use e.g. Redis.
This is a great achievement for us, after spending years trying to fix features
such as the
Nova service group subsystem
and seeing our proposals postponed forever.
(If you want to know more, LWN.net has
a great article about that session.)
Telemetry team name
With the new projects launched this last year, Aodh & Gnocchi, in parallel of
the old Ceilometer, plus the change from programs to Big Tent in OpenSack, the
team is having an identity issue. Being referred to as the "Ceilometer team" is
not really accurate, as some of us only work on Aodh or on Gnocchi. So after
discussing that, I
proposed to rename the team to Telemetry
instead. We'll see how it goes.
Alarms
The first session was about alarms and the Aodh project. It turns out that the
project is in pretty good shape, but probably need some more love, which I hope
I'll be able to provide in the next months.
The need for a new aodhclient based on the technologies we recently used
building gnocchiclient has been reasserted, so we might end up working on
that pretty soon. The Tempest support also needs some improvement, and we have
a plan to enhance that.
Data visualisation
We got David Lyle in this session, the Project Technical Leader for
Horizon. It was an interesting discussion. It used
to be technically challenging to draw charts from the data Ceilometer collects,
but it's now very easy with Gnocchi and its API.
While the technical side is resolved, the more political and user experience
side of was to draw and how was discussed at length. We don't want to make
people think that Ceilometer and Gnocchi are a full monitoring solution, so
there's some precaution to take. Other than that, it would be pretty cool to
have view of the data in Horizon.
Rolling upgrade
It turns out that Ceilometer has an architecture that makes it easy to have
rolling upgrade. We just need to write a proper documentation explaining how to
do it and in which order the services should be upgraded.
Ceilometer splitting
The split of the alarm feature of Ceilometer in its own project Aodh in the
last cycle was a great success for the whole team. We want to split other
pieces of Ceilometer, as they make sense on their own, makes it easier to
manage. They are also some projects that want to use them without the whole
stack, so that's a good idea to make it happen.
CloudKitty & Gnocchi
I attended the 2 sessions that were allocated to
CloudKitty. It was pretty
interesting as they want to simplify their architecture and leverage what
Gnocchi provides. I proposed my view of the project architecture and how they
could leverage the more of Gnocchi to retrieve and store data. They want to go
in that direction though it's a large amount of work and refactoring on their
side, so it'll take time.
We also need to enhance the support of extension for new resources in Gnocchi,
and that's something I hope I'll work on in the next months.
Overall, this summit was pretty good and I got a tremendous amount of good
feedback on Gnocchi. I again managed to get enough ideas and tasks to tackle
for the next 6 months. It really looks interesting to see where the whole team
will go from that. Stay tuned!
We got pretty good feedback on Gnocchi so far,
even if we only had little. Recently, in order to have a better feeling of
where we were at, we wanted to know how fast (or slow) Gnocchi was.
The
early benchmarks that some of the Mirantis engineers ran last year
showed pretty good signs. But a year later, it was time to get real numbers and
have a good understanding of Gnocchi capacity.
Benchmark tools
The first thing I realized when starting that process, is that we were lacking
of tools to run benchmarks. Therefore I started to write some benchmark tools
in python-gnocchiclient, which
provides a command line tool to interrogate Gnocchi. I added a few basic
commands to measure metric performance, such as:
The command line tool supports the --verbose switch to have detailed progress
report on the benchmark progression. So far it supports metric operations only,
but that's the most interesting part of Gnocchi.
Spinning up some hardware
I got a couple of bare metal servers to test Gnocchi on. I dedicated the first
one to Gnocchi, and used the second one as the benchmark client, plugged on the
same network. Each server is made of
2 Intel Xeon E5-2609 v3
(12 cores in total) and 32 GB of RAM. That provides a lot of CPU to handle
requests in parallel.
Then I simply performed a basic
RHEL 7
installation and ran devstack to spin up an installation
of Gnocchi based on the master branch, disabling all of the others OpenStack
components. I then tweaked the Apache httpd configuration to use the worker MPM
and increased the maximum number of clients that can sent request
simultaneously.
I configured Gnocchi to use the PostsgreSQL indexer, as it's the recommended
one, and the file storage driver, based on Carbonara (Gnocchi own storage
engine). That means files were stored locally rather than in Ceph or Swift.
Using the file driver is less scalable (you have to run on only one node or
uses a technology like NFS to share the files), but it was good enough for this
benchmark and to have some numbers and profiling the beast.
The OpenStack Keystone authentication middleware was not enabled in this setup,
as it would add some delay validating the authentication token.
Metric CRUD operations
Metric creation is pretty fast. I managed to attain 1500 metric/s created
pretty easily. Deletion is now asynchronous, which means it's faster than in
Gnocchi 1.2, but it's still slower than creation: 300 metric/s can be deleted.
That does not sound like a huge issue since metric deletion is actually barely
used in production.
Retrieving metric information is also pretty fast and goes up to 800 metric/s.
It'd be easy to achieve very higher throughput for this one, as it'd be easy to
cache, but we didn't feel the need to implement it so far.
Another important thing is that all of these numbers are constant and barely
depends on the number of the metric already managed by Gnocchi.
Operation
Details
Rate
Create metric
Created 100k metrics in 77 seconds
1300 metric/s
Delete metric
Deleted 100k metrics in 190 seconds
524 metric/s
Show metric
Show a metric 100k times in 149 seconds
670 metric/s
Sending and getting measures
Pushing measures into metrics is one of the hottest topic. Starting with
Gnocchi 1.1, the measures pushed are treated asynchronously, which makes it
much faster to push new measures. Getting new numbers on that feature was
pretty interesting.
The number of metric per second you can push depends on the batch size, meaning
the number of actual measurements you send per call. The naive approach is to
push 1 measure per call, and in that case, Gnocchi is able to handle around 600
measures/s. With a batch containing 100 measures, the number of calls per
second goes down to 450, but since you push 100 measures each time, that means
45k measures per second pushed into Gnocchi!
I've pushed the test further, inspired by the recent
blog post of InfluxDB claiming to achieve 300k points per second
with their new engine. I ran the same benchmark on the hardware I had, which is
roughly two times smaller than the one they used. I achieved to push Gnocchi to
a little more than 120k measurement per second. If I had same hardware as they
used, I could interpolate the results to achieve almost 250k measures/s pushed.
Obviously, you can't strictly compare Gnocchi and InfluxDB since they are not
doing exactly the same thing, but it still looks way better than what I
expected.
Using smaller batch sizes of 1k or 2k improve the throughput further to around
125k measures/s.
Operation
Details
Rate
Push metric 5k
Push 5M measures with batch of 5k measures in 40 seconds
122k measures/s
Push metric 4k
Push 5M measures with batch of 4k measures in 40 seconds
125k measures/s
Push metric 3k
Push 5M measures with batch of 3k measures in 40 seconds
123k measures/s
Push metric 2k
Push 5M measures with batch of 2k measures in 41 seconds
121k measures/s
Push metric 1k
Push 5M measures with batch of 1k measures in 44 seconds
113k measures/s
Push metric 500
Push 5M measures with batch of 500 measures in 51 seconds
98k measures/s
Push metric 100
Push 5M measures with batch of 100 measures in 112 seconds
45k measures/s
Push metric 10
Push 5M measures with batch of 10 measures in 852 seconds
6k measures/s
Push metric 1
Push 500k measures with batch of 1 measure in 800 seconds
624 measures/s
Get measures
Push 43k measures of 1 metric
260k measures/s
What about getting measures? Well, it's actually pretty fast too. Retrieving a
metric with 1 month of data with 1 minute interval (that's 43k points) takes
less than 2 second.
Though it's actually slower than what I expected. The reason seems to be that
the JSON is 2 MB big and encoding it takes a lot of time for Python. I'll
investigate that. Another point I discovered, is that by default Gnocchi
returns all the datapoints for each granularities available for the asked
period, which might double the size of the returned data for nothing if you
don't need it. It'll be easy to add an option to the API to only retrieve what
you need though!
Once benchmarked, that meant I was able to retrieve 6 metric/s per second,
which translates to around 260k measures/s.
Metricd speed
New measures that are pushed into Gnocchi are processed asynchronously by the
gnocchi-metricd daemon. When doing the benchmarks above, I ran into a very
interesting issue: sending 10k measures on a metric would make
gnocchi-metricd uses up to 2 GB RAM and 120 % CPU for more than 10 minutes.
After further investigation, I found that the naive approach we used to
resample datapoints in Carbonara using Pandas was
causing that. I
reported a bug on Pandas and
the upstream author was kind enough to provide a nice workaround, that I sent
as a pull request to Pandas
documentation.
I wrote a fix for Gnocchi based on that, and started using it. Computing the
standard aggregation methods set (std, count, 95pct, min, max, sum, median,
mean) for 10k batches of 1 measure (worst case scenario) for one metric with
10k measures now takes only 20 seconds and uses 100 MB of RAM 45 faster.
That means that in normal operations, where only a few new measures are
processed, the operation of updating a metric only takes a few milliseconds.
Awesome!
Comparison with Ceilometer
For comparison sake, I've quickly run some read operations benchmark in
Ceilometer. I've fed it with one month of samples for 100 instances polled
every minute. That represents roughly 4.3M samples injected, and that took a
while almost 1 hour whereas it would have taken less than a minute in
Gnocchi. Then I tried to retrieve some statistics in the same way that we
provide them in Gnocchi, which mean aggregating them over a period of 60
seconds over a month.
Operation
Details
Rate
Read metric SQL
Read measures for 1 metric
2min 58s
Read metric MongoDB
Read measures for 1 metric
28s
Read metric Gnocchi
Read measures for 1 metric
2s
Obviously, Ceilometer is very slow. It has to look into 4M of samples to
compute and return the result, which takes a lot of time. Whereas Gnocchi just
has to fetch a file and pass it over. That also means that the more samples you
have (so the more time you collect data and the more resources you have),
slower Ceilometer will become. This is not a problem with Gnocchi, as I
emphasized when I started designing it.
Most Gnocchi operations are O(log R) where R is the number of metrics or
resources, whereas most Ceilometer operations are O(log S) where S is the
number of samples (measures). Since is R millions of time smaller than S,
Gnocchi gets to be much faster.
And what's even more interesting, is that Gnocchi is entirely scalable
horizontally. Adding more Gnocchi servers (for the API and its background
processing worker metricd) will multiply Gnocchi performances by the number
of servers added.
Improvements
There are several things to improve in Gnocchi, such as splitting Carbonara
archives to make them more efficient, especially from drivers such as Ceph and
Swift. It's already on my plate, and I'm looking forwarding to working on that!
And if you have any questions, feel free to shoot them in the comment section.
Last week, I've been invited to the
OpenStack Paris meetup #16,
whose subject was about metrics in OpenStack. Last time I spoke at this meetup
was back in 2012, during the
OpenStack Paris meetup #2.
A very long time ago!
I talked for half an hour about Gnocchi, the
OpenStack project I've been running for 18 months now. I started by explaining
the story behind the project and why we needed to build it. Ceilometer has an
interesting history and had a curious roadmap these last year, and I summarized
that briefly. Then I talk about how Gnocchi works and what it offers to users
and operators. The slides where full of JSON, but I imagine it offered a
interesting view of what the API looks like and how easy it is to operate. This
also allowed me to emphasize how many use cases are actually really covered and
solved, contrary to what Ceilometer did so far. The talk has been well received
and I got a few interesting questions at the end.
The video of the talk (in French) and my slides are available on my
talk page and below. I hope you'll enjoy it.
Hi Julien, and thanks for participating in this interview for the Journal du
Hacker. For our readers who don't know you, can you introduce you briefly?
You're welcome! My name is Julien, I'm 31 years old, and I live in Paris. I now
have been developing free software for around fifteen years. I had the pleasure
to work (among other things) on Debian,
Emacs and
awesome these last years, and more recently on
OpenStack. Since a few months now, I work at Red Hat, as a Principal Software
Engineer on OpenStack. I am in charge of doing upstream
development for that cloud-computing platform, mainly around the Ceilometer,
Aodh and Gnocchi projects.
Being myself a system architect, I follow your work in
OpenStack since a while. It's uncommon to have the
point of view of someone as implied as you are. Can you give us a summary of
the state of the project, and then detail your activities in this project?
The OpenStack project has grown and changed a lot since
I started 4 years ago. It started as a few projects providing the basics, like
Nova (compute),
Swift (object storage),
Cinder (volume),
Keystone (identity) or even
Neutron (network) who are basis for a
cloud-computing platform, and finally became composed of a lot more projects.
For a while, the inclusion of projects was the subject of a strict review from
the technical committee. But since a few months, the rules have been relaxed,
and we see a lot more projects connected to cloud-computing
joining us.
As far as I'm concerned, I've started with a few others people the
Ceilometer
project in 2012, devoted to handling metrics of OpenStack platforms. Our goal
is to be able to collect all the metrics and record them to analyze them later.
We also have a module providing the ability to trigger actions on threshold
crossing (alarm).
The project grew in a monolithic way, and in a linear way for the number of
contributors, during the first two years. I was the PTL (Project Technical
Leader) for a year. This leader position asks for a lot of time for
bureaucratic things and people management, so I decided to leave my spot in
order to be able to spend more time solving the technical challenges that
Ceilometer offered.
I've started the Gnocchi project in 2014. The
first stable version (1.0.0) was released a few months ago. It's a timeseries
database offering a REST API and a strong ability to scale. It was a necessary
development to solve the problems tied to the large amount of metrics created
by a cloud-computing platform, where tens of thousands of virtual machines have
to be metered as often as possible. This project works as a standalone
deployment or with the rest of OpenStack.
More recently, I've started Aodh, the result of
moving out the code and features of Ceilometer related to threshold action
triggering (alarming). That's the logical suite to what we started with
Gnocchi. It means Ceilometer is to be split into independent modules that can
work together with or without OpenStack. It seems to me that the features
provided by Ceilometer, Aodh and Gnocchi can also be interesting for operators
running more classical infrastructures. That's why I've pushed the projects
into that direction, and also to have a more service-oriented architecture
(SOA)
I'd like to stop for a moment on Ceilometer. I think that this solution was
very expected, especially by the cloud-computing providers using OpenStack
for billing resources sold to their customers. I remember reading a blog post
where you were talking about the high-speed construction of this brick, and
features that were not supposed to be there. Nowadays, with Gnocchi and Aodh,
what is the quality of the brick Ceilometer and the programs it relies on?
Indeed, one of the first use-case for Ceilometer was tied to the ability to get
metrics to feed a billing tool. That's now a reached goal since we have billing
tools for OpenStack using Ceilometer, such as
CloudKitty.
However, other use-cases appeared rapidly, such as the ability to trigger
alarms. This feature was necessary, for example, to implement the auto-scaling
feature that Heat needed. At the time, for
technical and political reasons, it was not possible to implement this feature
in a new project, and the functionality ended up in Ceilometer, since it was
using the metrics collected and stored by Ceilometer itself.
Though, like I said, this feature is now in its own project, Aodh. The alarm
feature is used since a few cycles in production, and the Aodh project brings
new features on the table. It allows to trigger threshold actions and is one of
the few solutions able to work at high scale with several thousands of alarms.
It's impossible to make Nagios run with millions of instances to fetch metrics
and triggers alarms. Ceilometer and Aodh can do that easily on a few tens of
nodes automatically.
On the other side, Ceilometer has been for a long time painted as slow and
complicated to use, because its metrics storage system was by default using
MongoDB. Clearly, the data structure model picked
was not optimal for what the users were doing with the data.
That's why I started Gnocchi last year, which is perfectly designed for this
use case. It allows linear access time to metrics (O(1) complexity) and fast
access time to the resources data via an index.
Today, with 3 projects having their own perimeter of features defined and
which can work together Ceilometer, Aodh and Gnocchi finally erased the
biggest problems and defects of the initial project.
To end with OpenStack, one last question. You're a
Python developer for a long time and a fervent user
of software testing and
test-driven development.
Several of your blogs posts point how important their usage are. Can you tell
us more about the usage of tests in OpenStack, and the test prerequisites to
contribute to OpenStack?
I don't know any project that is as tested on every layer as OpenStack is. At
the start of the project, there was a vague test coverage, made of a few unit
tests. For each release, a bunch of new features were provided, and you had to
keep your fingers crossed to have them working. That's already almost
unacceptable. But the big issue was that there was also a lot of regressions,
et things that were working were not anymore. It was often corner cases that
developers forgot about that stopped working.
Then the project decided to change its policy and started to refuse all patches
new features or bug fix that would not implement a minimal set of unit
tests, proving the patch would work. Quickly, regressions were history, and the
number of bugs largely reduced months after months.
Then came the functional tests, with the
Tempest project, which runs a test battery on a
complete OpenStack deployment.
OpenStack now possesses a
complete test infrastructure, with
operators hired full-time to maintain them. The developers have to write the
test, and the operators maintain an architecture based on Gerrit, Zuul, and
Jenkins, which runs the test battery of each project for each patch sent.
Indeed, for each version of a patch sent, a full OpenStack is deployed into a
virtual machine, and a battery of thousands of unit and functional tests is run
to check that no regressions are possible.
To contribute to OpenStack, you need to know how to write a unit test the
policy on functional tests is laxer. The tools used are standard Python tools,
unittest for the framework and tox to run a
virtual environment (venv) and run them.
It's also possible to use
DevStack to deploy an
OpenStack platform on a virtual machine and run functional tests. However,
since the project infrastructure also do that when a patch is submitted, it's
not mandatory to do that yourself locally.
The tools and tests you write for OpenStack are written in Python, a language
which is very popular today. You seem to like it more than you have to, since
you wrote a book about it,
The Hacker's Guide to Python, that I
really enjoyed. Can you explain what brought you to Python, the main strong
points you attribute to this language (quickly) and how you went from
developer to author?
I stumbled upon Python by chance, around 2005. I don't remember how I hear
about it, but I bought a first book to discover it and started toying with that
language. At that time, I didn't find any project to contribute to or to start.
My first project with Python was rebuildd for Debian in
2007, a bit later.
I like Python for its simplicity, its object orientation rather clean, its
easiness to be deployed and its rich open source ecosystem. Once you get the
basics, it's very easy to evolve and to use it for anything, because the
ecosystem makes it easy to find libraries to solve any kind of problem.
I became an author by chance, writing blog posts from time to time about Python.
I finally realized that after a few years studying Python internals (CPython),
I learned a lot of things. While writing a post about
the differences between method types in Python
which is still one of the most read post on my blog I realized that a lot
of things that seemed obvious to me where not for other developers.
I wrote that initial post after thousands of hours spent doing code reviews on
OpenStack. I, therefore, decided to note all the developers pain points and to
write a book about that. A compilation of what years of experience taught me
and taught to the other developers I decided to interview in the book.
I've been very interested by the publication of your book, for the subject
itself, but also the process you chose. You self-published the book, which
seems very relevant nowadays. Is that a choice from the start? Did you look
for an editor? Can you tell use more about that?
I've been lucky to find out about others self-published authors, such as
Nathan Barry who even wrote a book on that
subject, called Authority. That's what
convinced me it was possible and gave me hints for that project.
I've started to write in August 2013, and I ran the firs interviews with other
developers at that time. I started to write the table of contents and then
filled the pages with what I knew and what I wanted to share. I manage to
finish the book around January 2014. The proof-reading took more time than I
expected, so the book was only released in March 2014. I wrote a
complete report on that on
my blog, where I explain the full process in detail, from writing to launching.
I did not look for editors though I've been proposed some. The idea of
self-publishing really convince me, so I decided to go on my own, and I have no
regret. It's true that you have to wear two hats at the same time and handle a
lot more things, but with a minimal audience and some help from the Internet,
anything's possible!
I've been reached by two editors since then, a
Chinese and
Korean one. I gave
them rights to translate and publish the books in their countries, so you can
buy the Chinese and Korean version of the first edition of the book out there.
Seeing how successful it was, I decided to launch a second edition in Mai 2015,
and it's likely that a third edition will be released in 2016.
Nowadays, you work for Red Hat, a company that
represents the success of using Free Software as a commercial business model.
This company fascinates a lot in our community. What can you say about your
employer from your point of view?
It only has been a year since I joined Red Hat (when they bought
eNovance), so my experience is quite recent.
Though, Red Hat is really a special company on every level. It's hard to see
from the outside how open it is, and how it works. It's really close to and it
really looks like an open source project. For more details, you should read
The Open Organization,
a book wrote by Jim Whitehurst (CEO of Red Hat), which he just published. It
describes perfectly how Red Hat works. To summarize, meritocracy and the lack
of organization in silos is what makes Red Hat a strong organization and puts
them as
one of the most innovative company.
In the end, I'm lucky enough to be autonomous for the project I work on with my
team around OpenStack, and I can spend 100% working upstream and enhance the
Python ecosystem.
We've been hard working with the Gnocchi team these last months to store your
metrics, and I guess it's time to show off a bit.
So far Gnocchi offers scalable metric storage and resource indexation,
especially for OpenStack cloud but not only, we're generic. It's cool to
store metrics, but it can be even better to have a way to visualize them!
Prototyping
We very soon started to build a little HTML interface. Being REST-friendly
guys, we enabled it on the same endpoints that were being used to retrieve
information and measures about metric, sending back text/html instead of
application/json if you were requesting those pages from a Web browser.
But let's face it: we are back-end developers, we suck at any kind front-end
development. CSS, HTML, JavaScript? Bwah! So what we built was a starting
point, hoping some magical Web developer would jump in and finish the job.
Obviously it never happened.
Ok, so what's out there?
It turns out there are back-end agnostic solutions out there, and we decided to
pick Grafana. Grafana is a complete graphing dashboard
solution that can be plugged on top of any back-end. It already supports
timeseries databases such as Graphite, InfluxDB and OpenTSDB.
That was largely enough for that my fellow developer
Mehdi Abaakouk to jump in and start writing a
Gnocchi plugin for Grafana! Consequently, there is now a basic but solid and
working back-end for Grafana that lies in the
grafana-plugins
repository.
With that plugin, you can graph anything that is stored in Gnocchi, from raw
metrics to metrics tied to resources. You can use templating, but no annotation
yet.
The back-end supports Gnocchi with or without Keystone involved, and any type
of authentication (basic auth or Keystone token). So yes, it even works if
you're not running Gnocchi with the rest of OpenStack.
It also supports advanced queries, so you can search for resources based on
some criterion and graphs their metrics.
I want to try it!
If you want to deploy it, all you need to do is to install Grafana and its
plugins, and create a new datasource pointing to Gnocchi. It is that simple.
There's some CORS middleware configuration involved if you're planning on using
Keystone authentication, but it's pretty straightforward just set the
cors.allowed_origin option to the URL of your Grafana dashboard.
We added support of Grafana directly in Gnocchi devstack plugin. If you're
running DevStack you can follow
the instructions
which are basically adding the line enable_service gnocchi-grafana.
Moving to Grafana core
Mehdi just opened a pull request
a few days ago to merge the plugin into Grafana core. It's actually one of the
most unit-tested plugin in Grafana so far, so it should be on a good path to be
merged in the future and have support of Gnocchi directly into Grafana without
any plugin involved.
Continuing my post series on the tools I use these days in Python, this time I
would like to talk about a library I really like, named
voluptuous.
It's no secret that most of the time, when a program receives data from the
outside, it's a big deal to handle it. Indeed, most of the time your program
has no guarantee that the stream is valid and that it contains what is
expected.
The robustness principle
says you should be liberal in what you accept, though
that is not always a good idea
neither. Whatever policy you chose, you need to process those data and
implement a policy that will work lax or not.
That means that the program need to look into the data received, check that it
finds everything it needs, complete what might be missing (e.g. set some
default), transform some data, and maybe reject those data in the end.
Data validation
The first step is to validate the data, which means checking all the fields are
there and all the types are right or understandable (parseable). Voluptuous
provides a single interface for all that called a Schema.
The argument to voluptuous.Schema should be the data structure that you
expect. Voluptuous accepts any kind of data structure, so it could also be a
simple string or an array of dict of array of integer. You get it.
Here it's a dict with a few keys that if present should be validated as
certain types. By default, Voluptuous does not raise an error if some keys
are missing. However, it is invalid to have extra keys in a dict by default. If
you want to allow extra keys, it is possible to specify it.
You can create custom data type very easily. Voluptuous data types are
actually just functions that are called with one argument, the value, and that
should either return the value or raise an Invalid or ValueError exception.
>>>fromvoluptuousimportSchema,Invalid >>>defStringWithLength5(value): ...ifisinstance(value,str)andlen(value)==5: ...returnvalue ...raiseInvalid("Not a string with 5 chars") ... >>>s=Schema(StringWithLength5) >>>s("hello") 'hello' >>>s("hello world") voluptuous.MultipleInvalid:Notastringwith5chars
Most of the time though, there is no need to create your own data types.
Voluptuous provides logical operators that can, combined with a few others
provided primitives such as voluptuous.Length or voluptuous.Range, create a
large range of validation scheme.
The voluptuous documentation has a
good set of examples that you can check to have a good overview of what you can
do.
Data transformation
What's important to remember, is that each data type that you use is a function
that is called and returns a value, if the value is considered valid. That
value returned is what is actually used and returned after the schema
validation:
By defining a custom UUID function that converts a value to a UUID, the
schema converts the string passed in the data to a Python UUID object
validating the format at the same time.
Note a little trick here: it's not possible to use directly uuid.UUID in the
schema, otherwise Voluptuous would check that the data is actually an
instance of uuid.UUID:
Recursive schemas
So far, Voluptuous has one limitation so far: the ability to have recursive
schemas. The simplest way to circumvent it is by using another function as an
indirection.
I started to use Voluptuous to validate data in a the REST API provided by
Gnocchi. So far it has been a really good tool,
and we've been able to
create a complete REST API
that is very easy to validate on the server side. I would definitely recommend
it for that. It blends with any Web framework easily.
One of the upside compared to solution like
JSON Schema, is the ability to create or re-use your
own custom data types while converting values at validation time. It is also
very Pythonic, and extensible it's pretty great to use for all of that. It's
also not tied to any serialization format.
On the other hand, JSON Schema is language agnostic and is serializable itself
as JSON. That makes it easy to be exported and provided to a consumer so it can
understand the API and validate the data potentially on its side.