Search Results: "Soeren Sonnenburg"

29 October 2013

Soeren Sonnenburg: Shogun Toolbox Version 3.0 released!

Dear all, we are proud to announce the 3.0 release of the Shogun Machine-Learning Toolbox. This release features the incredible projects of our 8 hard-working Google Summer of Code students. In addition, you get other cool new features as well as lots of internal improvements, bugfixes, and documentation improvements. To speak in numbers, we got more than 2000 commits changing almost 400000 lines in more than 7000 files and increased the number of unit tests from 50 to 600. This is the largest release that Shogun ever had! Please visit http://shogun-toolbox.org/ to obtain Shogun. News Here is a brief description of what is new, starting with the GSoC projects, which deserve most fame: Screenshots Everyone likes screenshots. Well, we have got something better! All of the above projects (and more) are now documented in the form of IPython notebooks, combining machine learning fundamentals, code, and plots. Those are a great looking way that we chose to document our framework from now on. Have a look at them and feel free to submit your use case as a notebook! FGM.html GMM.html HashedDocDotFeatures.html LMNN.html SupportVectorMachines.html Tapkee.html bss_audio.html bss_image.html ecg_sep.html gaussian_processes.html logdet.html mmd_two_sample_testing.html The web-demo framework has been integrated into our website, go check them out. Other changes We finally moved to the Shogun build process to CMake. Through GSoC, added a general clone and equals methods to all Shogun objects, and added automagic unit-testing for serialisation and clone/equals for all classes. Other new features include multiclass LDA, and probability outputs for multiclass SVMs. For the full list, see the NEWS. Workshop Videos and slides In case you missed the first Shogun workshop that we organised in Berlin last July, all of the talks have been put online. Shogun in the Cloud As setting up the right environment for shogun and installing it was always one of the biggest problems for the users (hence the switching to CMake), we have created a sandbox where you can try out shogun on your own without installing shogun on your system! Basically it's a web-service which give you access to your own ipython notebook server with all the shogun notebooks. Of course you are more than welcome to create and share your own notebooks using this service! *NOTE*: This is a courtesy service created by Shogun Toolbox developers, hence if you like it please consider some form of donation to the project so that we can keep up this service running for you. Try shogun in the cloud. Thanks The release has been made possible by the hard work of all of our GSoC students, see list above. Thanks also to Thoralf Klein and Bj rn Esser for the load of great contributions. Last but not least, thanks to all the people who use Shogun and provide feedback. S ren Sonnenburg on behalf of the Shogun team (+ Viktor Gal, Sergey Lisitsyn, Heiko Strathmann and Fernando Iglesias)

Soeren Sonnenburg: Shogun goes cloud

Shogun goes cloud. Try out http://cloud.shogun-toolbox.org to interactively play with machine learning algorithms or try any of the interactive demos. The SHOGUN Machine Learning Toolbox is a collection of algorithms designed for unified learning for a broad range of feature types and learning settings, like classification, regression, or explorative data analysis. For more information visit http://www.shogun-toolbox.org.

9 July 2013

Soeren Sonnenburg: Shogun Toolbox Days 2013 Program and Updates

Dear all, we are excited that the first Shogun workshop, July 12-14 in Berlin, is getting closer. Thanks to all the people that signed up -- we are sure it will be a packed and inspiring weekend! We have finalized the schedule for Friday, July 12, taking place at the C-Base (see description below [1]). After an intro where everyone gets to know each other and where we introduce ourselves, Shogun, and Machine Learning in general, there will be some tutorials by Shogun developers. In addition, we will have discussions about various topics and various coffee breaks and lunch. Finally, we will enjoy a summer's evening in Berlin. On Saturday and Sunday, July 13-14 there will be hands on sessions at the Technical University Berlin [2], where developers are around for more close up discussions and practical guidance. Bring your Laptop if you want to try things. See the final schedule for more details [1]. We plan to do video recordings of all lectures and will have a live stream [3]. See you there! The Shogun-Team

3 May 2013

Soeren Sonnenburg: Shogun Student Applications Statistics for Google Summer of Code 2013

Almost a month has passed since SHOGUN has been accepted for Google Summer of Code 2013. Student application deadline was today (May 6) and shogun received 57 proposals from 52 students. This is quite some increase compared to 2012 (48 applications from 38 students). What is interesting though is that it didn't look that good in the very beginning (see the figure below): Comparing this to 2012, this curve is much more flat in the beginning but exponentially increasing towards the end. Why is that? We didn't change the way we engaged with students (even though we tried to improve the instructions and added lots of entrance tagged tasks to github issues). We still require patches to be submitted to even be considered. So it is similarly tough to get into gsoc 2013 with us as it was in the previous year. What is interesting though is that various organizations complained about a slow uptake in the beginning. And it turns out that google did limit the number of student applications from 20 (last year) to 5 (in 2013). This might explain the shape of the curve: Students are more cautious to apply but once the deadline is near the apply to the maximum of 5 to improve their chances. This goes hand-in-hand with the observation that the quality of newly submitted student applications tends to decrease towards the deadline. So did this new limit hurt? To the contrary! In the end the quality of proposals increased a lot and we were able to even way before the student application deadline start to score/rank students. We are happy to have many very strong candidates this year again. Lets hope we get enough slots to accommodate all of the excellent students and then lets start the fun :)

17 April 2013

Soeren Sonnenburg: CfP: Shogun Machine Learning Workshop, July 12-14, Berlin, Germany

CALL FOR PARTICIPATION: Shogun Machine Learning Workshop, Berlin, Germany, July 12-14, 2013 Data Science, Big-Data are omnipresent terms documenting the need for automated tools to analyze the ever growing wealth of data. To this end we invite practitioners, researchers and students to participate in the first Shogun machine learning workshop. While the workshop is centered around the development and use of the shogun machine learning toolbox, it will also feature general machine learning subjects. General Information The workshop will include: Do not miss the chance to familiarize yourself with the shogun machine learning toolbox for solving various data analysis tasks and to talk to their authors and contributors. The program of the workshop will cover from basic to advanced topics in machine learning and how to approach them using Shogun, which makes it suitable for anyone, no matter if you are a senior researcher or practitioner with many year's of experience, or a junior student willing to discover much more. Interested? A tentative schedule is available at http://shogun-toolbox.org/page/Events/workshop2013_program. Call for contributions The organizing committee is seeking workshop contributions. The committee will select several submitted contributions for 15-minute talks and poster presentations. The accepted contributions will also be published on the workshop web site. Amongst other topics, we encourage submission that Submission Guidelines Send an abstract of your talk/contribution to shogun-workshop2013@shogun-toolbox.org before June 1. Notifications will be given on June 7. Registration Workshop registration is free of charge. However, only a limited number of seats is available. First-come, first-served! Register by filling out the registration form. Location and Timeline The main workshop will take place at c-base Berlin (http://c-base.org/, https://en.wikipedia.org/wiki/C-base) on July 12. It is followed by additional 2-day hands-on sessions held at TU Berlin on July 13-14. About the Shogun machine learning toolbox Shogun is designed for unified large-scale learning for a broad range of feature types and learning settings, like classification, regression, or explorative data analysis. Further information is available at http://www.shogun-toolbox.org.

8 April 2013

Soeren Sonnenburg: Shogun got accepted at Google Summer of Code 2013

We are happy to announce that the shogun machine learning toolbox will participate in this years google summer of code :-) SHOGUN is designed for unified large-scale learning for a broad range of feature types and learning settings, like classification, regression, or exploratory data analysis. In case you are a talented student interested in implementing and learning about machine learning algorithms - we are looking for you! We have collected a number of ideas [1] that we consider worthwhile implementing. And don't forget to apply [2]!

[1] http://shogun-toolbox.org/page/Events/gsoc2013_ideas

[2] http://google-melange.appspot.com/gsoc/org/google/gsoc2013/shogun

17 March 2013

Soeren Sonnenburg: Shogun Toolbox Version 2.1.0 Released!

We have just released shogun 2.1.0. This release contains over 800 commits since 2.0.0 with a load of bugfixes, new features and improvements (see the changelog for details) that make Shogun more efficient, robust and versatile. In particular, Christian Montanari developed a first alpha version of a perl modular interface, Heiko Strathmann did add Linear Time MMD on Streaming Data, Viktor Gal wrote a new structured output solver and Sergey Lisitsyn added support for tapkee - a dimension reduction framework. Read more at http://www.shogun-toolbox.org

23 January 2013

Soeren Sonnenburg: Shogun Toolbox Days 2013 - Give Your Data a Treat

Shogun Toolbox Days 2013 - Give Your Data a Treat Dear all, save the date - July 12, 2013! we just received confirmation from the c-base crew [1], [2] - the first shogun machine learning toolbox workshop will take place at c-base this Summer July 12 in Berlin! Anyone interested in participating / this event / or interested in helping to organize it - please talk to us on IRC or post on the mailinglist...

13 November 2012

Soeren Sonnenburg: Shogun at PyData NYC 2012

Christian Widmer gave a talk about the shogun machine learning toolbox version 2.0 at PyData NYC 2012. His talk is finally available as a video online.

27 October 2012

Soeren Sonnenburg: Shogun at Google Summer of Code 2012

The summer came finally to an end and (yes in Berlin we still had 20 C end of October), unfortunately, so did GSoC with it. This has been the second time for SHOGUN to be in GSoC. For those unfamiliar with SHOGUN - it is a very versatile machine learning toolbox that enables unified large-scale learning for a broad range of feature types and learning settings, like classification, regression, or explorative data analysis. I again played the role of an org admin and co-mentor this year and would like to take the opportunity to summarize enhancements to the toolbox and my GSoC experience: In contrast to last year, we required code-contributions in the application phase of GSoC already, i.e., a (small) patch was mandatory for your application to be considered. This reduced the number of applications we received: 48 proposals from 38 students instead of 70 proposals from about 60 students last year but also increased the overall quality of the applications. In the end we were very happy to get 8 very talented students and have the opportunity of boosting the project thanks to their hard and awesome work. Thanks to google for sponsoring three more students compared to last GSoC. Still we gave one slot back to the pool for good to the octave project (They used it very wisely and octave will have a just-in-time compiler now, which will benefit us all!). SHOGUN 2.0.0 is the new release of the toolbox including of course all the new features that the students have implemented in their projects. On the one hand, modules that were already in SHOGUN have been extended or improved. For example, Jacob Walker has implemented Gaussian Processes (GPs) improving the usability of SHOGUN for regression problems. A framework for multiclass learning by Chiyuan Zhang including state-of-the-art methods in this area such as Error-Correcting Output Coding (ECOC) and ShareBoost, among others. In addition, Evgeniy Andreev has made very important improvements w.r.t. the accessibility of SHOGUN. Thanks to his work with SWIG director classes, now it is possible to use python for prototyping and make use of that code with the same flexibility as if it had been written in the C++ core of the project. On the other hand, completely new frameworks and other functionalities have been added to the project as well. This is the case of multitask learning and domain adaptation algorithms written by Sergey Lisitsyn and the kernel two-sample or dependence test by Heiko Strathmann. Viktor Gal has introduced latent SVMs to SHOGUN and, finally, two students have worked in the new structured output learning framework. Fernando Iglesias made the design of this framework introducing the structured output machines into SHOGUN while Michal Uricar has implemented several bundle methods to solve the optimization problem of the structured output SVM. It has been very fun and interesting how the work done in different projects has been put together very early, even during the GSoC period. Only to show an example of this dealing with the generic structured output framework and the improvements in the accessibility. It is possible to make use of the SWIG directors to implement the application specific mechanisms of a structured learning problem instance in python and then use the rest of the framework (written in C++) to solve this new problem. Students! You all did a great job and I am more than amazed what you all have achieved. Thank you very much and I hope some of you will stick around. Besides all these improvements it has been particularly challenging for me as org admin to scale the project. While I could still be deeply involved in each and every part of the project last GSoC, this was no longer possible this year. Learning to trust that your mentors are doing the job is something that didn't come easy to me. Having had about monthly all-hands meetings did help and so did monitoring the happiness of the students. I am glad that it all worked out nicely this year too. Again, I would like to mention that SHOGUN improved a lot code-base/code-quality wise. Students gave very constructive feedback about our (lack) of proper Vector/Matrix/String/Sparse Matrix types. We now have all these implemented doing automagic memory garbage collection behind scenes. We have started to transition to use Eigen3 as our matrix library of choice, which made quite a number of algorithms much easier to implement. We generalized the Label framework (CLabels) to be tractable for not just classification and regression but multitask and structured output learning. Finally, we have had quite a number of infrastructure improvements. Thanks to GSoC money we have a dedicated server for running the buildbot/buildslaves and website. The ML Group at TU Berlin does sponsor virtual machines for building SHOGUN on Debian and Cygwin. Viktor Gal stepped up providing buildslaves for Ubuntu and FreeBSD. Gunnar Raetschs group is supporting redhat based build tests. We have Travis CI running testing pull requests for breakage even before merges. Code quality is now monitored utilizing LLVMs scan-build. Bernard Hernandez appeared and wrote a fancy new website for SHOGUN. A more detailed description of the achievements of each of the students follows:

26 April 2012

Soeren Sonnenburg: GSoC2012 Accepted Students

Shogun has received an outstanding 9 slots in this years google summer of code. Thanks to google and the hard work of our mentors ranking and discussing with the applicants - we were able to accept 8 talented students (We had to return one slot back to the pool due to not having enough mentors for all tasks. We indicated that octave gets the slot so lets hope they got it and will spend it well :). Each of the students will work on rather challenging machine learning topics. The accepted topics and the students attacking them with the help of there mentors are We are excited to have them all in the team! Happy hacking!

9 April 2012

Soeren Sonnenburg: Shogun Student Applications Statistics for Google Summer of Code 2012

A few weeks have passed since SHOGUN has been accepted for Google Summer of Code 2012. Student application deadline was Easter Friday (April 6) and shogun received 48 proposals from 38 students. Some more detailed stats can be found in the figure below. This is a drastic drop compared with last year (about 60 students submitted 70 proposals). However, this drop can easily be explained: To even apply we required a small patch, which is a big hurdle.
  1. One has to get shogun to compile (possibly only easy under debian; for cygwin/MacOSX you have to invest quite some time to even get all of its dependencies).
  2. One has to become familiar with git, github and learn how to issue a pull request.
  3. And finally understand enough of machine learning, shogun's source code to be able to fix a bug or implement some baseline machine learning method
Nevertheless, about a dozen of proposals didn't come with a patch (even though written on the instructions page that this is required) - an easy reject. In the end the quality of proposals increased a lot and we have many very strong candidates this year. Now we will have to wait to see how many slots we will receive before we can finally start the fun :-)

18 March 2012

Soeren Sonnenburg: Shogun got accepted at Google Summer of Code 2012

SHOGUN has been accepted for Google Summer of Code 2012. SHOGUN is a machine learning toolbox, which is designed for unified large-scale learning for a broad range of feature types and learning settings. It offers a considerable number of machine learning models such as support vector machines for classification and regression, hidden Markov models, multiple kernel learning, linear discriminant analysis, linear programming machines, and perceptrons. Most of the specific algorithms are able to deal with several different data classes, including dense and sparse vectors and sequences using floating point or discrete data types. We have used this toolbox in several applications from computational biology, some of them coming with no less than 10 million training examples and others with 7 billion test examples. With more than a thousand installations worldwide, SHOGUN is already widely adopted in the machine learning community and beyond. SHOGUN is implemented in C++ and interfaces to all important languages like MATLAB, R, Octave, Python, Lua, Java, C#, Ruby and has a stand-alone command line interface. The source code is freely available under the GNU General Public License, Version 3 at http://www.shogun-toolbox.org. During Summer of Code 2012 we are looking to extend the library in three different ways:
  1. Improving accessibility to shogun by developing improving i/o support (more file formats) and mloss.org/mldata.org integration.
  2. Framework improvements (frameworks for regression, multiclass, structured output problems, quadratic progamming solvers).
  3. Integration of existing and new machine algorithms.
Check out our ideas list and if you are a talented student, consider applying!

24 October 2011

Soeren Sonnenburg: Google Summer of Code Mentors Summit 2011

Google mentors summit 2011 is over. It is hard to believe but we had way more discussions about open science, or open data, open source, and open access than on all machine learning open source software workshops combined. Well, and the number of participants oft the science sessions was unexpectedly high too. One of the interesting suggestions was the science code manifesto suggesting among other things that all code written specifically for a paper must be available to the reviewers and readers of the paper. I think that this can nicely be extended to data too and really should receive wide support! Besides such high goals the summit was a good occasion to get in touch with core developers of git, opencv, llvm/clang, orange, octave, python... and discuss about concrete collaborations, issues and bugs that the shogun toolbox is triggering. So yes I already fixed some and more to come. Thanks google for sponsoring this - it's been a very nice event.

7 September 2011

Soeren Sonnenburg: Shogun at Google Summer of Code 2011

Google Summer of Code 2011 gave a big boost to the development of the shogun machine learning toolbox. In case you have never heard of shogun or machine learning: Machine Learning involves algorithms that do intelligent'' and even automatic data processing and is nowadays used everywhere to e.g. do face detection in your camera, compress the speech in you mobile phone, powers the recommendations in your favourite online shop, predicts solulabily of molecules in water, the location of genes in humans, to name just a few examples. Interested? Then you should give it a try. Some very simple examples stemming from a sub-branch of machine learning called supervised learning illustrate how objects represented by two-dimensional vectors can be classified in good or bad by learning a so called support vector machine. I would suggest to install the python_modular interface of shogun and to run the example interactive_svm_demo.py also included in the source tarball. Two images illustrating the training of a support vector machine follow (click to enlarge): svm svm interactive Now back to Google Summer of Code: Google sponsored 5 talented students who were working hard on various subjects. As a result we now have a new core developer and various new features implemented in shogun: Interfaces to new languages like java, c#, ruby, lua written by Baozeng; A model selection framework written by Heiko Strathman, many dimension reduction techniques written by Sergey Lisitsyn, Gaussian Mixture Model estimation written by Alesis Novik and a full-fledged online learning framework developed by Shashwat Lal Das. All of this work has already been integrated in the newly released shogun 1.0.0. In case you want to know more about the students projects continue reading below, but before going into more detail I would like to summarize my experience with GSoC 2011. My Experience with Google Summer of Code We were a first time organization, i.e. taking part for the first time in GSoC. Having received many many student applications we were very happy to hear that we at least got 5 very talented students accepted but still had to reject about 60 students (only 7% acceptance rate!). Doing this was an extremely tough decision for us. Each of us ended up in scoring students even then we had many ties. So in the end we raised the bar by requiring contributions even before the actual GSoC started. This way we already got many improvements like more complete i/o functions, nicely polished ROC and other evaluation routines, new machine learning algorithms like gaussian naive bayes and averaged perceptron and many bugfixes. The quality of the contributions and independence of the student aided us coming up with the selection of the final five. I personally played the role of the administrator and (co-)mentor and scheduled regular (usually) monthly irc meetings with mentors and students. For other org admins or mentors wanting into GSoC here come my lessons learned: Now please read on to learn about the newly implemented features: Dimension Reduction Techniques Sergey Lisitsyn (Mentor: Christian Widmer) Dimensionality reduction is the process of finding a low-dimensional representation of a high-dimensional one while maintaining the core essence of the data. For one of the most important practical issues of applied machine learning, it is widely used for preprocessing real data. With a strong focus on memory requirements and speed, Sergey implemented the following dimension reduction techniques: See below for the some nice illustrations of dimension reduction/embedding techniques (click to enlarge). isomap swissrollrno kllelocal tangent space alignment Cross-Validation Framework Heiko Strathmann (Mentor: Soeren Sonnenburg) Nearly every learning machine has parameters which have to be determined manually. Before Heiko started his project one had to manually implement cross-validation using (nested) for-loops. In his highly involved project Heiko extend shogun's core to register parameters and ultimately made cross-validation possible. He implemented different model selection schemes (train,validation,test split, n-fold cross-validation, stratified cross-validation, etc and did create some examples for illustration. Note that various performance measures are available to measure how good'' a model is. The figure below shows the area under the receiver operator characteristic curve as an example. foo Interfaces to the Java, C#, Lua and Ruby Programming Languages Baozeng (Mentor: Mikio Braun and Soeren Sonnenburg) Boazeng implemented swig-typemaps that enable transfer of objects native to the language one wants to interface to. In his project, he added support for Java, Ruby, C# and Lua. His knowlegde about swig helped us to drastically simplify shogun's typemaps for existing languages like octave and python resolving other corner-case type issues. The addition of these typemaps brings a high-performance and versatile machine learning toolbox to these languages. It should be noted that shogun objects trained in e.g. python can be serialized to disk and then loaded from any other language like say lua or java. We hope this helps users working in multiple-language environments. Note that the syntax is very similar across all languages used, compare for yourself - various examples for all languages ( python, octave, java, lua, ruby, and csharp) are available. Largescale Learning Framework and Integration of Vowpal Wabbit Shashwat Lal Das (Mentor: John Langford and Soeren Sonnenburg) Shashwat introduced support for 'streaming' features into shogun. That is instead of shogun's traditional way of requiring all data to be in memory, features can now be streamed from e.g. disk, enabling the use of massively big data sets. He implemented support for dense and sparse vector based input streams as well as strings and converted existing online learning methods to use this framework. He was particularly careful and even made it possible to emulate streaming from in-memory features. He finally integrated (parts of) vowpal wabbit, which is a very fast large scale online learning algorithm based on SGD. Expectation Maximization Algorithms for Gaussian Mixture Models Alesis Novik (Mentor: Vojtech Franc) The Expectation-Maximization algorithm is well known in the machine learning community. The goal of this project was the robust implementation of the Expectation-Maximization algorithm for Gaussian Mixture Models. Several computational tricks have been applied to address numerical and stability issues, like An illustrative example of estimating a one and two-dimensional Gaussian follows below. 1D GMM2D GMM Final Remarks All in all, this year s GSoC has given the SHOGUN project a great push forward and we hope that this will translate into an increased user base and numerous external contributions. Also, we hope that by providing bindings for many languages, we can provide a neutral ground for Machine Learning implementations and that way bring together communities centered around different programming languages. All that s left to say is that given the great experiences from this year, we d be more than happy to participate in GSoC2012.

3 September 2011

Soeren Sonnenburg: Hello World

I have finally managed to setup a blog using handcrafted html and django. Having had the experience to write mloss.org, largescale.ml.tu-berlin.de and mldata.org my homepage here was done rather quickly (one weekend of hacking). If anyone is interested in website design I can really only recommend to use django. Work your way through the awesome tutorial or the book and get inspired by the sources of the many django based websites out there. Finally keep an eye on django code snippets that often contain small but useful functions and are solutions to problems you might have.

20 December 2006

Julien Blache: mbpeventd status update

Here’s a quick status update for mbpeventd users; PowerBook users may be interested too ;-) So, here we go: The GTK client is complete but still needs a bit more work: configuration support, and changing the colors of the progress bar (currently the colors of the current GTK+ 2 theme are used). Yves-Alexis could probably use some help on the PowerBook front, mainly for identifying the appropriate i2c device to use and other things. If you are interested by using mbpeventd on a PowerBook, check out the mbpeventd-ppc branch and give it a try. Plan for upcoming releases: As we’re adding PowerBook support, we’ll be renaming mbpeventd, so watch out for the new name when we’ll merge PowerBook support in :-) Thanks to: Source code, check it out using svn co :
DBus branch: http://svn.technologeek.org/repos/mbpeventd/branches/mbpeventd-dbus
PPC branch: http://svn.technologeek.org/repos/mbpeventd/branches/mbpeventd-ppc