Search Results: "Soeren Sonnenburg"

Dear all, we are proud to announce the 3.0 release of the Shogun Machine-Learning Toolbox. This release features the incredible projects of our 8 hard-working Google Summer of Code students. In addition, you get other cool new features as well as lots of internal improvements, bugfixes, and documentation improvements. To speak in numbers, we got more than 2000 commits changing almost 400000 lines in more than 7000 files and increased the number of unit tests from 50 to 600. This is the largest release that Shogun ever had! Please visit http://shogun-toolbox.org/ to obtain Shogun. News Here is a brief description of what is new, starting with the GSoC projects, which deserve most fame:

Gaussian Process classification by Roman Votjakov
Structured Output Learning of graph models by Shell Hu
Estimators for log-determinants of large sparse matrices by Soumyajit De
Feature Hashing and random kitchen sinks by Evangelos Anagnostopoulos
Independent Component Analysis by Kevin Hughes
A web-based demo framework by Liu Zhengyang
Metric learning with large margin nearest neighbours by Fernando Iglesias
Native support for various popular file formats by Evgeniy Andreev

Screenshots Everyone likes screenshots. Well, we have got something better! All of the above projects (and more) are now documented in the form of IPython notebooks, combining machine learning fundamentals, code, and plots. Those are a great looking way that we chose to document our framework from now on. Have a look at them and feel free to submit your use case as a notebook!

The web-demo framework has been integrated into our website, go check them out.

Other changes We finally moved to the Shogun build process to CMake. Through GSoC, added a general clone and equals methods to all Shogun objects, and added automagic unit-testing for serialisation and clone/equals for all classes. Other new features include multiclass LDA, and probability outputs for multiclass SVMs. For the full list, see the NEWS. Workshop Videos and slides In case you missed the first Shogun workshop that we organised in Berlin last July, all of the talks have been put online. Shogun in the Cloud As setting up the right environment for shogun and installing it was always one of the biggest problems for the users (hence the switching to CMake), we have created a sandbox where you can try out shogun on your own without installing shogun on your system! Basically it's a web-service which give you access to your own ipython notebook server with all the shogun notebooks. Of course you are more than welcome to create and share your own notebooks using this service! *NOTE*: This is a courtesy service created by Shogun Toolbox developers, hence if you like it please consider some form of donation to the project so that we can keep up this service running for you. Try shogun in the cloud. Thanks The release has been made possible by the hard work of all of our GSoC students, see list above. Thanks also to Thoralf Klein and Bj rn Esser for the load of great contributions. Last but not least, thanks to all the people who use Shogun and provide feedback. S ren Sonnenburg on behalf of the Shogun team (+ Viktor Gal, Sergey Lisitsyn, Heiko Strathmann and Fernando Iglesias)

Shogun goes cloud. Try out http://cloud.shogun-toolbox.org to interactively play with machine learning algorithms or try any of the interactive demos. The SHOGUN Machine Learning Toolbox is a collection of algorithms designed for unified learning for a broad range of feature types and learning settings, like classification, regression, or explorative data analysis. For more information visit http://www.shogun-toolbox.org.

Dear all, we are excited that the first Shogun workshop, July 12-14 in Berlin, is getting closer. Thanks to all the people that signed up -- we are sure it will be a packed and inspiring weekend! We have finalized the schedule for Friday, July 12, taking place at the C-Base (see description below [1]). After an intro where everyone gets to know each other and where we introduce ourselves, Shogun, and Machine Learning in general, there will be some tutorials by Shogun developers. In addition, we will have discussions about various topics and various coffee breaks and lunch. Finally, we will enjoy a summer's evening in Berlin. On Saturday and Sunday, July 13-14 there will be hands on sessions at the Technical University Berlin [2], where developers are around for more close up discussions and practical guidance. Bring your Laptop if you want to try things. See the final schedule for more details [1]. We plan to do video recordings of all lectures and will have a live stream [3]. See you there! The Shogun-Team

[0] c-base, Rungestra e 20, 10179 Berlin, Germany
[1] Workshop Program
[2] TU-Berlin Room MAR 0.001
[3] Live Stream

Almost a month has passed since SHOGUN has been accepted for Google Summer of Code 2013. Student application deadline was today (May 6) and shogun received 57 proposals from 52 students. This is quite some increase compared to 2012 (48 applications from 38 students). What is interesting though is that it didn't look that good in the very beginning (see the figure below):

Comparing this to 2012, this curve is much more flat in the beginning but exponentially increasing towards the end. Why is that? We didn't change the way we engaged with students (even though we tried to improve the instructions and added lots of entrance tagged tasks to github issues). We still require patches to be submitted to even be considered. So it is similarly tough to get into gsoc 2013 with us as it was in the previous year. What is interesting though is that various organizations complained about a slow uptake in the beginning. And it turns out that google did limit the number of student applications from 20 (last year) to 5 (in 2013). This might explain the shape of the curve: Students are more cautious to apply but once the deadline is near the apply to the maximum of 5 to improve their chances. This goes hand-in-hand with the observation that the quality of newly submitted student applications tends to decrease towards the deadline. So did this new limit hurt? To the contrary! In the end the quality of proposals increased a lot and we were able to even way before the student application deadline start to score/rank students. We are happy to have many very strong candidates this year again. Lets hope we get enough slots to accommodate all of the excellent students and then lets start the fun :)

CALL FOR PARTICIPATION: Shogun Machine Learning Workshop, Berlin, Germany, July 12-14, 2013 Data Science, Big-Data are omnipresent terms documenting the need for automated tools to analyze the ever growing wealth of data. To this end we invite practitioners, researchers and students to participate in the first Shogun machine learning workshop. While the workshop is centered around the development and use of the shogun machine learning toolbox, it will also feature general machine learning subjects. General Information The workshop will include:

A general introduction to machine learning held by Gunnar Raetsch.
Introductory talks about e.g. Dimension reduction techniques, Kernel-statistical testing, Gaussian Processes, Structured Output learning.
Contributed talks and a poster session, and a poster-spotlight.
A discussion panel
A hands on session on July 13-14

Do not miss the chance to familiarize yourself with the shogun machine learning toolbox for solving various data analysis tasks and to talk to their authors and contributors. The program of the workshop will cover from basic to advanced topics in machine learning and how to approach them using Shogun, which makes it suitable for anyone, no matter if you are a senior researcher or practitioner with many year's of experience, or a junior student willing to discover much more. Interested? A tentative schedule is available at http://shogun-toolbox.org/page/Events/workshop2013_program. Call for contributions The organizing committee is seeking workshop contributions. The committee will select several submitted contributions for 15-minute talks and poster presentations. The accepted contributions will also be published on the workshop web site. Amongst other topics, we encourage submission that

are applications / publications utilizing Shogun
are highly relevant to practitioners in the field
are of broad general interest
are extensions to Shogun

Submission Guidelines Send an abstract of your talk/contribution to shogun-workshop2013@shogun-toolbox.org before June 1. Notifications will be given on June 7. Registration Workshop registration is free of charge. However, only a limited number of seats is available. First-come, first-served! Register by filling out the registration form. Location and Timeline The main workshop will take place at c-base Berlin (http://c-base.org/, https://en.wikipedia.org/wiki/C-base) on July 12. It is followed by additional 2-day hands-on sessions held at TU Berlin on July 13-14. About the Shogun machine learning toolbox Shogun is designed for unified large-scale learning for a broad range of feature types and learning settings, like classification, regression, or explorative data analysis. Further information is available at http://www.shogun-toolbox.org.

We are happy to announce that the shogun machine learning toolbox will participate in this years google summer of code :-) SHOGUN is designed for unified large-scale learning for a broad range of feature types and learning settings, like classification, regression, or exploratory data analysis. In case you are a talented student interested in implementing and learning about machine learning algorithms - we are looking for you! We have collected a number of ideas [1] that we consider worthwhile implementing. And don't forget to apply [2]!

[1] http://shogun-toolbox.org/page/Events/gsoc2013_ideas

[2] http://google-melange.appspot.com/gsoc/org/google/gsoc2013/shogun

We have just released shogun 2.1.0. This release contains over 800 commits since 2.0.0 with a load of bugfixes, new features and improvements (see the changelog for details) that make Shogun more efficient, robust and versatile. In particular, Christian Montanari developed a first alpha version of a perl modular interface, Heiko Strathmann did add Linear Time MMD on Streaming Data, Viktor Gal wrote a new structured output solver and Sergey Lisitsyn added support for tapkee - a dimension reduction framework. Read more at http://www.shogun-toolbox.org

Shogun Toolbox Days 2013 - Give Your Data a Treat Dear all, save the date - July 12, 2013! we just received confirmation from the c-base crew [1], [2] - the first shogun machine learning toolbox workshop will take place at c-base this Summer July 12 in Berlin! Anyone interested in participating / this event / or interested in helping to organize it - please talk to us on IRC or post on the mailinglist...

Christian Widmer gave a talk about the shogun machine learning toolbox version 2.0 at PyData NYC 2012. His talk is finally available as a video online.

The summer came finally to an end and (yes in Berlin we still had 20 C end of October), unfortunately, so did GSoC with it. This has been the second time for SHOGUN to be in GSoC. For those unfamiliar with SHOGUN - it is a very versatile machine learning toolbox that enables unified large-scale learning for a broad range of feature types and learning settings, like classification, regression, or explorative data analysis. I again played the role of an org admin and co-mentor this year and would like to take the opportunity to summarize enhancements to the toolbox and my GSoC experience: In contrast to last year, we required code-contributions in the application phase of GSoC already, i.e., a (small) patch was mandatory for your application to be considered. This reduced the number of applications we received: 48 proposals from 38 students instead of 70 proposals from about 60 students last year but also increased the overall quality of the applications. In the end we were very happy to get 8 very talented students and have the opportunity of boosting the project thanks to their hard and awesome work. Thanks to google for sponsoring three more students compared to last GSoC. Still we gave one slot back to the pool for good to the octave project (They used it very wisely and octave will have a just-in-time compiler now, which will benefit us all!). SHOGUN 2.0.0 is the new release of the toolbox including of course all the new features that the students have implemented in their projects. On the one hand, modules that were already in SHOGUN have been extended or improved. For example, Jacob Walker has implemented Gaussian Processes (GPs) improving the usability of SHOGUN for regression problems. A framework for multiclass learning by Chiyuan Zhang including state-of-the-art methods in this area such as Error-Correcting Output Coding (ECOC) and ShareBoost, among others. In addition, Evgeniy Andreev has made very important improvements w.r.t. the accessibility of SHOGUN. Thanks to his work with SWIG director classes, now it is possible to use python for prototyping and make use of that code with the same flexibility as if it had been written in the C++ core of the project. On the other hand, completely new frameworks and other functionalities have been added to the project as well. This is the case of multitask learning and domain adaptation algorithms written by Sergey Lisitsyn and the kernel two-sample or dependence test by Heiko Strathmann. Viktor Gal has introduced latent SVMs to SHOGUN and, finally, two students have worked in the new structured output learning framework. Fernando Iglesias made the design of this framework introducing the structured output machines into SHOGUN while Michal Uricar has implemented several bundle methods to solve the optimization problem of the structured output SVM. It has been very fun and interesting how the work done in different projects has been put together very early, even during the GSoC period. Only to show an example of this dealing with the generic structured output framework and the improvements in the accessibility. It is possible to make use of the SWIG directors to implement the application specific mechanisms of a structured learning problem instance in python and then use the rest of the framework (written in C++) to solve this new problem. Students! You all did a great job and I am more than amazed what you all have achieved. Thank you very much and I hope some of you will stick around. Besides all these improvements it has been particularly challenging for me as org admin to scale the project. While I could still be deeply involved in each and every part of the project last GSoC, this was no longer possible this year. Learning to trust that your mentors are doing the job is something that didn't come easy to me. Having had about monthly all-hands meetings did help and so did monitoring the happiness of the students. I am glad that it all worked out nicely this year too. Again, I would like to mention that SHOGUN improved a lot code-base/code-quality wise. Students gave very constructive feedback about our (lack) of proper Vector/Matrix/String/Sparse Matrix types. We now have all these implemented doing automagic memory garbage collection behind scenes. We have started to transition to use Eigen3 as our matrix library of choice, which made quite a number of algorithms much easier to implement. We generalized the Label framework (CLabels) to be tractable for not just classification and regression but multitask and structured output learning. Finally, we have had quite a number of infrastructure improvements. Thanks to GSoC money we have a dedicated server for running the buildbot/buildslaves and website. The ML Group at TU Berlin does sponsor virtual machines for building SHOGUN on Debian and Cygwin. Viktor Gal stepped up providing buildslaves for Ubuntu and FreeBSD. Gunnar Raetschs group is supporting redhat based build tests. We have Travis CI running testing pull requests for breakage even before merges. Code quality is now monitored utilizing LLVMs scan-build. Bernard Hernandez appeared and wrote a fancy new website for SHOGUN. A more detailed description of the achievements of each of the students follows:

Kernel Two-sample/Dependence test
- Student: Heiko Strathmann
- Mentor: Arthur Gretton, Soeren Sonnenburg
Heiko Strathmann, mentored by Arthur Gretton, worked on a framework for kernel-based statistical hypothesis testing. Statistical tests to determine whether two random variables are are equal/different or are statistically independent are an important tool in data-analysis. However, when data are high-dimensional or in non-numerical form (documents, graphs), classical methods fail. Heiko implemented recently developed kernel-based generalisations of classical tests which overcome these issues by representing distributions in high dimensional so-called reproducing kernel Hilbert spaces. By doing so, theoretically any pair samples can be distinguished. Implemented methods include two-sample testing with the Maximum Mean Discrepancy (MMD) and independence testing using the Hilbert Schmidt Independence Criterion (HSIC). Both methods come in different flavours regarding computational costs and test constructions. For two-sample testing with the MMD, a linear time streaming version is included that can handle arbitrary amounts of data. All methods are integrated into a newly written flexible framework for statistical testing which will be extended in the future. A book-style tutorial with descriptions of algorithms and instructions how to use them is also included.
Implement multitask and domain adaptation algorithms
- Student: Sergey Lisityn
- Mentor: Christian Widmer
Like Heiko, Sergey Lisitsyn did participate in the GSoC programme for the second time. This year he focused on implementing multitask learning algorithms. Multitask learning is a modern approach to machine learning that learns a problem together with other related problems at the same time using a shared representation. This approach often leads to a better model for the main task, because it allows the learner to use the commonality among the tasks. During the summer Sergey has ported a few algorithms from the SLEP and MALSAR libraries with further extensions and improvements. Namely, L12 group tree and L1q group multitask logistic regression and least squares regression, trace-norm multitask logistic regression, clustered multitask logistic regression, basic group and group tree lasso logistic regression. All the implemented algorithms use COFFIN framework for flexible and efficient learning and some of the algorithms were implemented efficiently utilizing the Eigen3 library.
Implementation of / Integration via existing GPL code of latent SVMs.
- Student: Viktor Gal
- Mentor: Alexander Binder
A generic latent SVM and additionally a latent structured output SVM has been implemented. This machine learning algorithm is widely used in computer vision, namely in object detection. Other useful application fields are: motif finding in DNA sequences, noun phrase coreference, i.e. provide a clustering of the nouns such that each cluster refers to a single object. It is based on defining a general latent feature Psi(x,y,h) depending on input variable x, output variable y and latent variable h. Deriving a class from the base class allows the user to implement additional structural knowledge for efficient maximization of the latent variable or alternative ways of computation or on-the-fly loading of latent features Psi as a function of the input, output and latent variables.
Bundle method solver for structured output learning
- Student: Michal Uricar
- Mentor: Vojtech Franc
We have implemented two generic solvers for supervised learning of structured output (SO) classifiers. First, we implemented the current state-of-the-art Bundle Method for Regularized Risk Minimization (BMRM) [Teo et al. 2010]. Second, we implemented a novel variant of the classical Bundle Method (BM) [Lamarechal 1978] which achieves the same precise solution as the BMRM but in time up to an order of magnitude shorter [Uricar et al. 2012]. Among the main practical benefits of the implemented solvers belong their modularity and proven convergence guarantees. For training a particular SO classifier it suffices to provide the solver with a routine evaluating the application specific risk function. This feature is invaluable for designers who can concentrate on tuning the classification model instead of spending time on developing new optimization solvers. The convergence guarantees remove the uncertainty inherent in use of on-line approximate solvers. The implemented solvers have been integrated to the structured output framework of the SHOGUN toolbox and they have been tested on real-life data.
Built generic structured output learning framework
- Student: Fernando Jose Iglesias Garcia
- Mentor: Nico Goernitz
During GSoC 2012 Fernando implemented a generic framework for structured output (SO) problems. Structured output learning deals with problems where the prediction made is represented by an object with complex structure, e.g. a graph, a tree or a sequence. SHOGUN's SO framework is flexible and easy to extend [1]. Fernando implemented a na ve cutting plane algorithm for the SVM approach to SO [2]. In addition, he coded a case of use of the framework for labelled sequence learning, the so-called Hidden Markov SVM. The HM-SVM can be applied to solve problems in different fields such as gene prediction in bioinformatics or speech to text in pattern recognition. [1] Class diagram of the SO framework.
[2] Support Vector Machine Learning for Interdependent and Structured Output Spaces.
Improving accessibility to shogun
- Student: Evgeniy Andreev
- Mentor: Soeren Sonnenburg
During the latest google summer of code Evgeniy has improved the Python modular interface. He has added new SWIG-based feature - director classes, enabling users to extend SHOGUN classes with python code and made SHOGUN python 3 ready. Evgeniy has also added python's protocols for most usable arrays (like vectors, matrices, features) which makes possible to work with Shogun data structures just like with numpy arrays with no copy at all. For example one can now modify SHOGUN's RealFeatures in place or use.
Implement Gaussian Processes and regression techniques
- Student: Jacob Walker
- Mentor: Oliver Stegle
Jacob implemented Gaussian Process Regression in the Shogun Machine Learning Toolbox. He wrote a complete implementation of basic GPR as well as approximation methods such as the FITC (First Independent Training Conditional) method and the Laplacian approximation method. Users can utilize this feature to analyze large scale data in a variety of fields. Scientists can also build on this implementation to conduct research extending the capability and applicability of Gaussian Process Regression.
Build generic multiclass learning framework
- Student: Chiyuan Zhang
- Mentor: Cheng Soon Ong
Chiyuan defined a new multiclass learning framework within SHOGUN. He re-organized previous multiclass learning components and refactored the CMachine hierarchy in SHOGUN. Then he generalized the existing one-vs-one and one-vs-rest multiclass learning scheme to general ECOC encoding and decoding strategies. Beside this, several specific multiclass learning algorithms are added, including ShareBoost with feature selection ability, Conditional Probability Tree with online learning ability, and Relaxed Tree.

Shogun has received an outstanding 9 slots in this years google summer of code. Thanks to google and the hard work of our mentors ranking and discussing with the applicants - we were able to accept 8 talented students (We had to return one slot back to the pool due to not having enough mentors for all tasks. We indicated that octave gets the slot so lets hope they got it and will spend it well :).

Each of the students will work on rather challenging machine learning topics. The accepted topics and the students attacking them with the help of there mentors are

Kernel Two-sample/Dependence test
- Student: Heiko Strathmann
- Mentor: Arthur Gretton, Soeren Sonnenburg
- Abstract
Implement multitask and domain adaptation algorithms
- Student: Sergej Lisityn
- Mentor: Christian Widmer
- Abstract
Implementation of / Integration via existing GPL code of latent SVMs.
- Student: Viktor Gal
- Mentor: Alexander Binder
- Abstract
Bundle method solver for structured output learning
- Student: Michal Uricar
- Mentor: Vojtech Franc
- Abstract
Built generic structured output learning framework
- Student: Fernando Jose Iglesias Garcia
- Mentor: Nico Goernitz
- Abstract
Improving accessibility to shogun
- Student: Evgeniy Andreev
- Mentor: Soeren Sonnenburg
- Abstract
Implement Gaussian Processes and regression techniques
- Student: Jacob Walker
- Mentor: Oliver Stegle
- Abstract
Built generic multiclass learning framework
- Student: Chiyuan Zhang
- Mentor: Cheng Soon Ong
- Abstract

We are excited to have them all in the team! Happy hacking!

A few weeks have passed since SHOGUN has been accepted for Google Summer of Code 2012. Student application deadline was Easter Friday (April 6) and shogun received 48 proposals from 38 students. Some more detailed stats can be found in the figure below.

This is a drastic drop compared with last year (about 60 students submitted 70 proposals). However, this drop can easily be explained: To even apply we required a small patch, which is a big hurdle.

One has to get shogun to compile (possibly only easy under debian; for cygwin/MacOSX you have to invest quite some time to even get all of its dependencies).
One has to become familiar with git, github and learn how to issue a pull request.
And finally understand enough of machine learning, shogun's source code to be able to fix a bug or implement some baseline machine learning method

Nevertheless, about a dozen of proposals didn't come with a patch (even though written on the instructions page that this is required) - an easy reject. In the end the quality of proposals increased a lot and we have many very strong candidates this year. Now we will have to wait to see how many slots we will receive before we can finally start the fun :-)

SHOGUN has been accepted for Google Summer of Code 2012. SHOGUN is a machine learning toolbox, which is designed for unified large-scale learning for a broad range of feature types and learning settings. It offers a considerable number of machine learning models such as support vector machines for classification and regression, hidden Markov models, multiple kernel learning, linear discriminant analysis, linear programming machines, and perceptrons. Most of the specific algorithms are able to deal with several different data classes, including dense and sparse vectors and sequences using floating point or discrete data types. We have used this toolbox in several applications from computational biology, some of them coming with no less than 10 million training examples and others with 7 billion test examples. With more than a thousand installations worldwide, SHOGUN is already widely adopted in the machine learning community and beyond. SHOGUN is implemented in C++ and interfaces to all important languages like MATLAB, R, Octave, Python, Lua, Java, C#, Ruby and has a stand-alone command line interface. The source code is freely available under the GNU General Public License, Version 3 at http://www.shogun-toolbox.org. During Summer of Code 2012 we are looking to extend the library in three different ways:

Improving accessibility to shogun by developing improving i/o support (more file formats) and mloss.org/mldata.org integration.
Framework improvements (frameworks for regression, multiclass, structured output problems, quadratic progamming solvers).
Integration of existing and new machine algorithms.

Check out our ideas list and if you are a talented student, consider applying!

Google mentors summit 2011 is over. It is hard to believe but we had way more discussions about open science, or open data, open source, and open access than on all machine learning open source software workshops combined. Well, and the number of participants oft the science sessions was unexpectedly high too. One of the interesting suggestions was the science code manifesto suggesting among other things that all code written specifically for a paper must be available to the reviewers and readers of the paper. I think that this can nicely be extended to data too and really should receive wide support! Besides such high goals the summit was a good occasion to get in touch with core developers of git, opencv, llvm/clang, orange, octave, python... and discuss about concrete collaborations, issues and bugs that the shogun toolbox is triggering. So yes I already fixed some and more to come. Thanks google for sponsoring this - it's been a very nice event.

Google Summer of Code 2011 gave a big boost to the development of the shogun machine learning toolbox. In case you have never heard of shogun or machine learning: Machine Learning involves algorithms that do intelligent'' and even automatic data processing and is nowadays used everywhere to e.g. do face detection in your camera, compress the speech in you mobile phone, powers the recommendations in your favourite online shop, predicts solulabily of molecules in water, the location of genes in humans, to name just a few examples. Interested? Then you should give it a try. Some very simple examples stemming from a sub-branch of machine learning called supervised learning illustrate how objects represented by two-dimensional vectors can be classified in good or bad by learning a so called support vector machine. I would suggest to install the python_modular interface of shogun and to run the example interactive_svm_demo.py also included in the source tarball. Two images illustrating the training of a support vector machine follow (click to enlarge):

Now back to Google Summer of Code: Google sponsored 5 talented students who were working hard on various subjects. As a result we now have a new core developer and various new features implemented in shogun: Interfaces to new languages like java, c#, ruby, lua written by Baozeng; A model selection framework written by Heiko Strathman, many dimension reduction techniques written by Sergey Lisitsyn, Gaussian Mixture Model estimation written by Alesis Novik and a full-fledged online learning framework developed by Shashwat Lal Das. All of this work has already been integrated in the newly released shogun 1.0.0. In case you want to know more about the students projects continue reading below, but before going into more detail I would like to summarize my experience with GSoC 2011. My Experience with Google Summer of Code We were a first time organization, i.e. taking part for the first time in GSoC. Having received many many student applications we were very happy to hear that we at least got 5 very talented students accepted but still had to reject about 60 students (only 7% acceptance rate!). Doing this was an extremely tough decision for us. Each of us ended up in scoring students even then we had many ties. So in the end we raised the bar by requiring contributions even before the actual GSoC started. This way we already got many improvements like more complete i/o functions, nicely polished ROC and other evaluation routines, new machine learning algorithms like gaussian naive bayes and averaged perceptron and many bugfixes. The quality of the contributions and independence of the student aided us coming up with the selection of the final five. I personally played the role of the administrator and (co-)mentor and scheduled regular (usually) monthly irc meetings with mentors and students. For other org admins or mentors wanting into GSoC here come my lessons learned:

Set up the infrastructure for your project before GSoC: We transitioned from svn to git (on github) just before GSoC started. While it was a bit tough to work with git in the beginning it quickly payed off (patch reviewing and discussions on github were really much more easy). We did not have proper regression tests running daily during most of GSoC leaving a number of issues undetected for quite some time. Now that we have buildbots running I keep wondering how we could survive for so long without them :-)
Even though all of our students worked very independently, you want to mentor them very closely in the beginning such that they write code that you like to see in your project, following coding style, utilizing already existing helper routines. We did this and it simplified our lives later - we could mostly accept patches as is.
Expect contributions from external parties: We had contributions to shogun's ruby and csharp interfaces/examples. Ensure that you have some spare manpower to review such additional patches.
Expect critical code review by your students and be open to restructure the code. As a long term contributer you probably no longer realize whether your class-design / code structure is hard to digest. Freshmans like GSoC students immediately will when they stumble upon inconsitencies. When they discover such issues, discuss with them how to resolve them and don't be afraid of doing even bigger changes in the early GSoC phase (not too big to hinder work of all of your students though). We had quite some structural improvent in shogun due to several suggestions by our students. Overall the project improved drastically - not just w.r.t. additions.
As a mentor, work with your student on the project. Yes, get your hands dirty too. This way you are much more of an help to the student when things get stuck and it will be much easier for you to answer difficult questions.
As a mentor, try to answer the questions your students have within a few hours. This keeps the students motivated and you excited that they are doing a great job.

Now please read on to learn about the newly implemented features: Dimension Reduction Techniques Sergey Lisitsyn (Mentor: Christian Widmer) Dimensionality reduction is the process of finding a low-dimensional representation of a high-dimensional one while maintaining the core essence of the data. For one of the most important practical issues of applied machine learning, it is widely used for preprocessing real data. With a strong focus on memory requirements and speed, Sergey implemented the following dimension reduction techniques:

Locally Linear Embedding
Kernel Locally Linear Embedding
Local Tangent Space Alignment
Multidimensional scaling (with capability of landmark approximation)
Isomap
Hessian Locally Linear Embedding
Laplacian Eigenmaps

See below for the some nice illustrations of dimension reduction/embedding techniques (click to enlarge).

Cross-Validation Framework Heiko Strathmann (Mentor: Soeren Sonnenburg) Nearly every learning machine has parameters which have to be determined manually. Before Heiko started his project one had to manually implement cross-validation using (nested) for-loops. In his highly involved project Heiko extend shogun's core to register parameters and ultimately made cross-validation possible. He implemented different model selection schemes (train,validation,test split, n-fold cross-validation, stratified cross-validation, etc and did create some examples for illustration. Note that various performance measures are available to measure how good'' a model is. The figure below shows the area under the receiver operator characteristic curve as an example.

Interfaces to the Java, C#, Lua and Ruby Programming Languages Baozeng (Mentor: Mikio Braun and Soeren Sonnenburg) Boazeng implemented swig-typemaps that enable transfer of objects native to the language one wants to interface to. In his project, he added support for Java, Ruby, C# and Lua. His knowlegde about swig helped us to drastically simplify shogun's typemaps for existing languages like octave and python resolving other corner-case type issues. The addition of these typemaps brings a high-performance and versatile machine learning toolbox to these languages. It should be noted that shogun objects trained in e.g. python can be serialized to disk and then loaded from any other language like say lua or java. We hope this helps users working in multiple-language environments. Note that the syntax is very similar across all languages used, compare for yourself - various examples for all languages ( python, octave, java, lua, ruby, and csharp) are available. Largescale Learning Framework and Integration of Vowpal Wabbit Shashwat Lal Das (Mentor: John Langford and Soeren Sonnenburg) Shashwat introduced support for 'streaming' features into shogun. That is instead of shogun's traditional way of requiring all data to be in memory, features can now be streamed from e.g. disk, enabling the use of massively big data sets. He implemented support for dense and sparse vector based input streams as well as strings and converted existing online learning methods to use this framework. He was particularly careful and even made it possible to emulate streaming from in-memory features. He finally integrated (parts of) vowpal wabbit, which is a very fast large scale online learning algorithm based on SGD. Expectation Maximization Algorithms for Gaussian Mixture Models Alesis Novik (Mentor: Vojtech Franc) The Expectation-Maximization algorithm is well known in the machine learning community. The goal of this project was the robust implementation of the Expectation-Maximization algorithm for Gaussian Mixture Models. Several computational tricks have been applied to address numerical and stability issues, like

Representing covariance matrices as their SVD
Doing operations in log domain to avoid overflow/underflow
Setting minimum variances to avoid singular Gaussians.
Merging/splitting of Gaussians.

An illustrative example of estimating a one and two-dimensional Gaussian follows below.

Final Remarks All in all, this year s GSoC has given the SHOGUN project a great push forward and we hope that this will translate into an increased user base and numerous external contributions. Also, we hope that by providing bindings for many languages, we can provide a neutral ground for Machine Learning implementations and that way bring together communities centered around different programming languages. All that s left to say is that given the great experiences from this year, we d be more than happy to participate in GSoC2012.

I have finally managed to setup a blog using handcrafted html and django. Having had the experience to write mloss.org, largescale.ml.tu-berlin.de and mldata.org my homepage here was done rather quickly (one weekend of hacking). If anyone is interested in website design I can really only recommend to use django. Work your way through the awesome tutorial or the book and get inspired by the sources of the many django based websites out there. Finally keep an eye on django code snippets that often contain small but useful functions and are solutions to problems you might have.

Here’s a quick status update for mbpeventd users; PowerBook users may be interested too ;-) So, here we go:

DBus support is done
Graphical clients are available: mbpgtk for a GTK-based application similar to the pbbuttonsd client, wmmbp for a WindowMaker dockapp
PowerBook support: Yves-Alexis Perez is adding support for the latest PowerBooks with keyboard backlight, and he’s mostly done with it

The GTK client is complete but still needs a bit more work: configuration support, and changing the colors of the progress bar (currently the colors of the current GTK+ 2 theme are used). Yves-Alexis could probably use some help on the PowerBook front, mainly for identifying the appropriate i2c device to use and other things. If you are interested by using mbpeventd on a PowerBook, check out the mbpeventd-ppc branch and give it a try. Plan for upcoming releases:

DBus support and graphical clients
PowerBook support and graphical clients improvements
More PowerBook support (more machines supported)

As we’re adding PowerBook support, we’ll be renaming mbpeventd, so watch out for the new name when we’ll merge PowerBook support in :-) Thanks to:

Yves-Alexis for the PPC support
Soeren Sonnenburg for the base of the GTK client

Source code, check it out using svn co :
DBus branch: http://svn.technologeek.org/repos/mbpeventd/branches/mbpeventd-dbus
PPC branch: http://svn.technologeek.org/repos/mbpeventd/branches/mbpeventd-ppc

Search Results: "Soeren Sonnenburg"

29 October 2013

Soeren Sonnenburg: Shogun Toolbox Version 3.0 released!

Soeren Sonnenburg: Shogun goes cloud

9 July 2013

Soeren Sonnenburg: Shogun Toolbox Days 2013 Program and Updates

3 May 2013

Soeren Sonnenburg: Shogun Student Applications Statistics for Google Summer of Code 2013

17 April 2013

Soeren Sonnenburg: CfP: Shogun Machine Learning Workshop, July 12-14, Berlin, Germany

8 April 2013

Soeren Sonnenburg: Shogun got accepted at Google Summer of Code 2013

17 March 2013

Soeren Sonnenburg: Shogun Toolbox Version 2.1.0 Released!

23 January 2013

Soeren Sonnenburg: Shogun Toolbox Days 2013 - Give Your Data a Treat

13 November 2012

Soeren Sonnenburg: Shogun at PyData NYC 2012

27 October 2012

Soeren Sonnenburg: Shogun at Google Summer of Code 2012

26 April 2012

Soeren Sonnenburg: GSoC2012 Accepted Students

9 April 2012

Soeren Sonnenburg: Shogun Student Applications Statistics for Google Summer of Code 2012

18 March 2012

Soeren Sonnenburg: Shogun got accepted at Google Summer of Code 2012

24 October 2011

Soeren Sonnenburg: Google Summer of Code Mentors Summit 2011

7 September 2011

Soeren Sonnenburg: Shogun at Google Summer of Code 2011

3 September 2011

Soeren Sonnenburg: Hello World

20 December 2006

Julien Blache: mbpeventd status update