Search Results: "epsilon"

7 July 2021

Dirk Eddelbuettel: Rcpp 1.0.7: More Updates

rcpp logo The Rcpp team is pleased to announce release 1.0.7 of Rcpp which arrived at CRAN earlier today, and will be uploaded to Debian shortly. Windows and macOS builds should appear at CRAN in the next few days. This release continues with the six-months cycle started with release 1.0.5 last July. As a reminder, interim dev or rc releases will alwasys be available in the Rcpp drat repo; this cycle there were seven (!!). These rolling release tend to work just as well, and are also fully tested against all reverse-dependencies. Rcpp has become the most popular way of enhancing R with C or C++ code. As of today, 2323 packages on CRAN depend on Rcpp for making analytical code go faster and further, along with 227 in BioConductor. This release contains a change which Luke Tierney urged us to make a good year ago in #1081) (and which we had looked at earlier in #382). Implementing the change in a regular update proved a little tricky, and my initial branch lay dormant until I aki revived it, and finished the transition (which we then did in two PRs). The change concerns how Rcpp grows internal objects, and the new approach (thanks to the hint by Luke) closer to what R does guaranteeing linear behaviour. It turns out that we overlooked one aspect (of coping with Modules built under earlier Rcpp releases) so the initial upload to CRAN on Saturday revealed that we needed a small adjustment that we made for the final release. This version should now be more performant, and rest on a stable API. Based on the reverse depends checks by both us and CRAN (using the updated version), we expect no issues with existing code. However, it something does act up a fresh compilation of the packages using Rcpp may help. We also made a few other minor changes in the API such as silencing a silly compiler warning, ensuring global Rcout and Rcerr objects, adding support for a new Rcpp::message() call, completing a switch to uint32_t instead of unsigned int and including the cfloat header (which relates to STRICT_R_HEADERS discussed below). Similarly, the Rcpp Attributes feature was enhanced by coping better with packages with a dot in their name and their for per-package include files, along with support for more quiet compilation if desired. As some Rcpp users may have seen, we plan to enable STRICT_R_HEADERS by the next release (expected in January 2022). A long issue tick #1158 is laying the ground work. Maintainers of 81 packages which are affected and need a small change (such as for example switching from PI to M_PI); of these 56 have already responded. We plan to be in touch in the fall. Adding the cfloat header is one help in this transition (as the corresponding C header was pulled in when STRICT_R_HEADERS is not defined) as it ensures DBL_EPSILON and alike are defined. Last but not least this is also the first relase in which we welcome I aki as a new member of the Rcpp Core team. Yay! The NEWS file entries follow summarizing the nine key PRs in this release.

Changes in Rcpp release version 1.0.7 (2021-07-06)
  • Changes in Rcpp API:
    • Refactored Rcpp_PreserveObject and Rcpp_ReleaseObject which are now O(1) (Dirk and I aki in #1133 and #1135 fixing #382 and #1081).
    • A spuriously assigned variable was removed (Dirk in #1138 fixing #1137).
    • Global Rcout and Rcerr objects are supported via a compiler directive (I aki in #1139 fixing #928)
    • Add support for Rcpp::message (Dirk in #1146 fixing #1145).
    • The uint32_t type is used throughout instead of unsigned int (Dirk in #1153 fixing #1152).
    • The cfloat header for floating point limits is now included (Dirk in #1162 fixing #1161).
  • Changes in Rcpp Attributes:
    • Packages with dots in their name can now have per-package include files (Dirk in #1132 fixing #1129).
    • New argument echo to quieten optional evaluation in sourceCpp (Dirk in #1138 fixing #1126).
  • Forthcoming Changes in Rcpp API:
    • Starting with Rcpp 1.0.8 anticipated in January 2022, STRICT_R_HEADERS will be enabled by default, see #1126.

Thanks to my CRANberries, you can also look at a diff to the previous release. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. Bugs reports are welcome at the GitHub issue tracker as well (where one can also search among open or closed issues); questions are also welcome under rcpp tag at StackOverflow which also allows searching among the (currently) 2616 previous questions. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

23 October 2014

Erich Schubert: Clustering 23 mio Tweet locations

To test scalability of ELKI, I've clustered 23 million Tweet locations from the Twitter Statuses Sample API obtained over 8.5 months (due to licensing restrictions by Twitter, I cannot make this data available to you, sorry.
23 million points is a challenge for advanced algorithms. It's quite feasible by k-means; in particular if you choose a small k and limit the number of iterations. But k-means does not make a whole lot of sense on this data set - it is a forced quantization algorithm, but does not discover actual hotspots.
Density-based clustering such as DBSCAN and OPTICS are much more appropriate. DBSCAN is a bit tricky to parameterize - you need to find the right combination of radius and density for the whole world. Given that Twitter adoption and usage is quite different it is very likely that you won't find a single parameter that is appropriate everywhere.
OPTICS is much nicer here. We only need to specify a minimum object count - I chose 1000, as this is a fairly large data set. For performance reasons (and this is where ELKI really shines) I chose a bulk-loaded R*-tree index for acceleration. To benefit from the index, the epsilon radius of OPTICS was set to 5000m. Also, ELKI allows using geodetic distance, so I can specify this value in meters and do not get much artifacts from coordinate projection.
To extract clusters from OPTICS, I used the Xi method, with xi set to 0.01 - a rather low value, also due to the fact of having a large data set.
The results are pretty neat - here is a screenshot (using KDE Marble and OpenStreetMap data, since Google Earth segfaults for me right now):
Screenshot of Clusters in central Europe
Some observations: unsurprisingly, many cities turn up as clusters. Also regional differences are apparent as seen in the screenshot: plenty of Twitter clusters in England, and low acceptance rate in Germany (Germans do seem to have objections about using Twitter; maybe they still prefer texting, which was quite big in Germany - France and Spain uses Twitter a lot more than Germany).
Spam - some of the high usage in Turkey and Indonesia may be due to spammers using a lot of bots there. There also is a spam cluster in the ocean south of Lagos - some spammer uses random coordinates [0;1]; there are 36000 tweets there, so this is a valid cluster...
A benefit of OPTICS and DBSCAN is that they do not cluster every object - low density areas are considered as noise. Also, they support clusters of different shape (which may be lost in this visualiation, which uses convex hulls!) and different size. OPTICS can also produce a hierarchical result.
Note that for these experiments, the actual Tweet text was not used. This has a rough correspondence to Twitter popularity "heatmaps", except that the clustering algorithms will actually provide a formalized data representation of activity hotspots, not only a visualization.
You can also explore the clustering result in your browser - the Google Drive visualization functionality seems to work much better than Google Earth.
If you go to Istanbul or Los Angeles, you will see some artifacts - odd shaped clusters with a clearly visible spike. This is caused by the Xi extraction of clusters, which is far from perfect. At the end of a valley in the OPTICS plot, it is hard to decide whether a point should be included or not. These errors are usually the last element in such a valley, and should be removed via postprocessing. But our OpticsXi implementation is meant to be as close as possible to the published method, so we do not intend to "fix" this.
Certain areas - such as Washington, DC, New York City, and the silicon valley - do not show up as clusters. The reason is probably again the Xi extraction - these region do not exhibit the steep density increase expected by Xi, but are too blurred in their surroundings to be a cluster.
Hierarchical results can be found e.g. in Brasilia and Los Angeles.
Compare the OPTICS results above to k-means results (below) - see why I consider k-means results to be a meaningless quantization?
k-means clusters
Sure, k-means is fast (30 iterations; not converged yet. Took 138 minutes on a single core, with k=1000. The parallel k-means implementation in ELKI took 38 minutes on a single node, Hadoop/Mahout on 8 nodes took 131 minutes, as slow as a single CPU core!). But you can see how sensitive it is to misplaced coordinates (outliers, but mostly spam), how many "clusters" are somewhere in the ocean, and that there is no resolution on the cities? The UK is covered by 4 clusters, with little meaning; and three of these clusters stretch all the way into Bretagne - k-means clusters clearly aren't of high quality here.
If you want to reproduce these results, you need to get the upcoming ELKI version (0.6.5~201410xx - the output of cluster convex hulls was just recently added to the default codebase), and of course data. The settings I used are:
-dbc.in coords.tsv.gz
-db.index tree.spatial.rstarvariants.rstar.RStarTreeFactory
-pagefile.pagesize 500
-spatial.bulkstrategy SortTileRecursiveBulkSplit
-time
-algorithm clustering.optics.OPTICSXi
-opticsxi.xi 0.01
-algorithm.distancefunction geo.LngLatDistanceFunction
-optics.epsilon 5000.0 -optics.minpts 1000
-resulthandler KMLOutputHandler -out /tmp/out.kmz
and the total runtime for 23 million points on a single core was about 29 hours. The indexes helped a lot: less than 10000 distances were computed per point, instead of 23 million - the expected speedup over a non-indexed approach is 2400.
Don't try this with R or Matlab. Your average R clustering algorithm will try to build a full distance matrix, and you probably don't have an exabyte of memory to store this matrix. Maybe start with a smaller data set first, then see how long you can afford to increase the data size.

2 November 2012

Erich Schubert: DBSCAN and OPTICS clustering

DBSCAN [wikipedia] and OPTICS [wikipedia] are two of the most well-known density based clustering algorithms. You can read more on them in the Wikipedia articles linked above.
An interesting property of density based clustering is that these algorithms do not assume clusters to have a particular shape. Furthermore, the algorithms allow "noise" objects that do not belong to any of the clusters. K-means for examples partitions the data space in Voronoi cells (some people claim it produces spherical clusters - that is incorrect). See Wikipedia for the true shape of K-means clusters and an example that canot be clustered by K-means. Internal measures for cluster evaluation also usually assume the clusters to be well-separated spheres (and do not allow noise/outlier objects) - not surprisingly, as we tend to experiment with artificial data generated by a number of Gaussian distributions.
The key parameter to DBSCAN and OPTICS is the "minPts" parameter. It roughly controls the minimum size of a cluster. If you set it too low, everything will become clusters (OPTICS with minPts=2 degenerates to a type of single link clustering). If you set it too high, at some point there won't be any clusters anymore, only noise. However, the parameter usually is not hard to choose. If you for example expect clusters to typically have 100 objects, I'd start with a value of 10 or 20. If your clusters are expected to have 10000 objects, then maybe start experimenting with 500.
The more difficult parameter for DBSCAN is the radius. In some cases, it will be very obvious. Say you are clustering users on a map. Then you might know that a good radius is 1 km. Or 10 km. Whatever makes sense for your particular application. In other cases, the parameter will not be obvious, or you might need multiple values. That is when OPTICS comes into play.
OPTICS is based on a very clever idea: instead of fixing MinPts and the Radius, we only fix minpts, and plot the radius at which an object would be considered dense by DBSCAN. In order to sort the objects on this plot, we process them in a priority heap, so that nearby objects are nearby in the plot. this image on Wikipedia shows an example for such a plot.
OPTICS comes at a cost compared to DBSCAN. Largely because of the priority heap, but also as the nearest neighbor queries are more complicated than the radius queries of DBSCAN. So it will be slower, but you no longer need to set the parameter epsilon. However, OPTICS won't produce a strict partitioning. Primarily it produces this plot, and in many situations you will actually want to visually inspect the plot. There are some methods to extract a hierarchical partitioning out of this plot, based on detecting "steep" areas.
The open source ELKI data mining framework (package "elki" in Debian and Ubuntu) has a very fast and flexible implementation of both algorithms. I've benchmarked this against GNU R ("fpc" package") and Weka, and the difference is enormous. ELKI without index support runs in roughly 11 minutes, with index down to 2 minutes for DBSCAN and 3 minutes for OPTICS. Weka takes 11 hours and GNU R/fpc takes 100 minutes (DBSCAN, no OPTICS available). And the implementation of OPTICS in Weka is not even complete (it does not support proper cluster extraction from the plot). Many of the other OPTICS implementations you can find with Google (e.g. in Python or MATLAB) seem to be based on this Weka version ...
ELKI is open source. So if you want to peek at the code, here are direct links: DBSCAN.java, OPTICS.java.
Some part of the code may be a bit confusing at first. The "Parameterizer" classes serve the purpose of allowing automatic UI generation, for example. So there is quite a bit of meta code involved.
Plus, ELKI is quite extensively optimized. For example, it does not use Java Collections much anymore. Java Iterators, for example, require returning an object on next();. The C++ style iterators used by ELKI can have multiple values, and primitive values.
for(DBIDIter id = relation.iterDBIDs(); id.valid(); id.advance())
is a typical for loop in ELKI, iterating over all objects of a relation, but the whole loop requires creating (and GC'ing) a single object. And actually, this is as literal as a for loop can get.
ModifiableDBIDs processedIDs = DBIDUtil.newHashSet(size);
is another example. Essentially, this is like a HashSet<DBID>. Except that it is a lot faster, because the object IDs do not need to live a Java objects, but can internally be stored more efficiently (the only currently available implementation of the DBID layer uses primitive integers).
Java advocates always accuse you of premature optimization when you avoid creating objects for primitives. Yet, in all my benchmarking, I have seen this continuously to have a major impact how many objects you allocate. At least when it is inside a loop that is heavily used. Java collections with boxed primitives just eat a lot of memory, and the memory management overhead does often make a huge difference. Which is why libraries such as Trove (which ELKI uses a lot) exist. Because memory usage does make a difference.
(Avoiding boxing/unboxing systematically in ELKI yielded approximately a 4x speedup. But obviously, ELKI involves a lot of numerical computations.)

23 May 2011

Gregor Herrmann: RC bugs 2011/11 - 2011/20

long time no post about my RC bug fixing activities. & indeed, I haven't been very active during the last weeks. here's a quick update; it includes two NMUs done for the perl 5.12 transition, the rest is just "normal" maintainance work (interesting to see how much stuff can break behind our backs ...). I still hope to find some more time in the future to look at other RC bugs; after all we want to have a not-too-long freeze for wheezy, don't we?

22 November 2010

Petter Reinholdtsen: Lenny->Squeeze upgrades of the Gnome and KDE desktop, now with apt-get autoremove

Michael Biebl suggested to me on IRC, that I changed my automated upgrade testing of the Lenny Gnome and KDE Desktop to do apt-get autoremove when using apt-get. This seem like a very good idea, so I adjusted by test scripts and can now present the updated result from today: This is for Gnome: Installed using apt-get, missing with aptitude
apache2.2-bin aptdaemon baobab binfmt-support browser-plugin-gnash cheese-common cli-common cups-pk-helper dmz-cursor-theme empathy empathy-common freedesktop-sound-theme freeglut3 gconf-defaults-service gdm-themes gedit-plugins geoclue geoclue-hostip geoclue-localnet geoclue-manual geoclue-yahoo gnash gnash-common gnome gnome-backgrounds gnome-cards-data gnome-codec-install gnome-core gnome-desktop-environment gnome-disk-utility gnome-screenshot gnome-search-tool gnome-session-canberra gnome-system-log gnome-themes-extras gnome-themes-more gnome-user-share gstreamer0.10-fluendo-mp3 gstreamer0.10-tools gtk2-engines gtk2-engines-pixbuf gtk2-engines-smooth hamster-applet libapache2-mod-dnssd libapr1 libaprutil1 libaprutil1-dbd-sqlite3 libaprutil1-ldap libart2.0-cil libboost-date-time1.42.0 libboost-python1.42.0 libboost-thread1.42.0 libchamplain-0.4-0 libchamplain-gtk-0.4-0 libcheese-gtk18 libclutter-gtk-0.10-0 libcryptui0 libdiscid0 libelf1 libepc-1.0-2 libepc-common libepc-ui-1.0-2 libfreerdp-plugins-standard libfreerdp0 libgconf2.0-cil libgdata-common libgdata7 libgdu-gtk0 libgee2 libgeoclue0 libgexiv2-0 libgif4 libglade2.0-cil libglib2.0-cil libgmime2.4-cil libgnome-vfs2.0-cil libgnome2.24-cil libgnomepanel2.24-cil libgpod-common libgpod4 libgtk2.0-cil libgtkglext1 libgtksourceview2.0-common libmono-addins-gui0.2-cil libmono-addins0.2-cil libmono-cairo2.0-cil libmono-corlib2.0-cil libmono-i18n-west2.0-cil libmono-posix2.0-cil libmono-security2.0-cil libmono-sharpzip2.84-cil libmono-system2.0-cil libmtp8 libmusicbrainz3-6 libndesk-dbus-glib1.0-cil libndesk-dbus1.0-cil libopal3.6.8 libpolkit-gtk-1-0 libpt2.6.7 libpython2.6 librpm1 librpmio1 libsdl1.2debian libsrtp0 libssh-4 libtelepathy-farsight0 libtelepathy-glib0 libtidy-0.99-0 media-player-info mesa-utils mono-2.0-gac mono-gac mono-runtime nautilus-sendto nautilus-sendto-empathy p7zip-full pkg-config python-aptdaemon python-aptdaemon-gtk python-axiom python-beautifulsoup python-bugbuddy python-clientform python-coherence python-configobj python-crypto python-cupshelpers python-elementtree python-epsilon python-evolution python-feedparser python-gdata python-gdbm python-gst0.10 python-gtkglext1 python-gtksourceview2 python-httplib2 python-louie python-mako python-markupsafe python-mechanize python-nevow python-notify python-opengl python-openssl python-pam python-pkg-resources python-pyasn1 python-pysqlite2 python-rdflib python-serial python-tagpy python-twisted-bin python-twisted-conch python-twisted-core python-twisted-web python-utidylib python-webkit python-xdg python-zope.interface remmina remmina-plugin-data remmina-plugin-rdp remmina-plugin-vnc rhythmbox-plugin-cdrecorder rhythmbox-plugins rpm-common rpm2cpio seahorse-plugins shotwell software-center system-config-printer-udev telepathy-gabble telepathy-mission-control-5 telepathy-salut tomboy totem totem-coherence totem-mozilla totem-plugins transmission-common xdg-user-dirs xdg-user-dirs-gtk xserver-xephyr
Installed using apt-get, removed with aptitude
cheese ekiga eog epiphany-extensions evolution-exchange fast-user-switch-applet file-roller gcalctool gconf-editor gdm gedit gedit-common gnome-games gnome-games-data gnome-nettool gnome-system-tools gnome-themes gnuchess gucharmap guile-1.8-libs libavahi-ui0 libdmx1 libgalago3 libgtk-vnc-1.0-0 libgtksourceview2.0-0 liblircclient0 libsdl1.2debian-alsa libspeexdsp1 libsvga1 rhythmbox seahorse sound-juicer system-config-printer totem-common transmission-gtk vinagre vino
Installed using aptitude, missing with apt-get
gstreamer0.10-gnomevfs
Installed using aptitude, removed with apt-get
[nothing]
This is for KDE: Installed using apt-get, missing with aptitude
ksmserver
Installed using apt-get, removed with aptitude
kwin network-manager-kde
Installed using aptitude, missing with apt-get
arts dolphin freespacenotifier google-gadgets-gst google-gadgets-xul kappfinder kcalc kcharselect kde-core kde-plasma-desktop kde-standard kde-window-manager kdeartwork kdeartwork-emoticons kdeartwork-style kdeartwork-theme-icon kdebase kdebase-apps kdebase-workspace kdebase-workspace-bin kdebase-workspace-data kdeeject kdelibs kdeplasma-addons kdeutils kdewallpapers kdf kfloppy kgpg khelpcenter4 kinfocenter konq-plugins-l10n konqueror-nsplugins kscreensaver kscreensaver-xsavers ktimer kwrite libgle3 libkde4-ruby1.8 libkonq5 libkonq5-templates libnetpbm10 libplasma-ruby libplasma-ruby1.8 libqt4-ruby1.8 marble-data marble-plugins netpbm nuvola-icon-theme plasma-dataengines-workspace plasma-desktop plasma-desktopthemes-artwork plasma-runners-addons plasma-scriptengine-googlegadgets plasma-scriptengine-python plasma-scriptengine-qedje plasma-scriptengine-ruby plasma-scriptengine-webkit plasma-scriptengines plasma-wallpapers-addons plasma-widget-folderview plasma-widget-networkmanagement ruby sweeper update-notifier-kde xscreensaver-data-extra xscreensaver-gl xscreensaver-gl-extra xscreensaver-screensaver-bsod
Installed using aptitude, removed with apt-get
ark google-gadgets-common google-gadgets-qt htdig kate kdebase-bin kdebase-data kdepasswd kfind klipper konq-plugins konqueror ksysguard ksysguardd libarchive1 libcln6 libeet1 libeina-svn-06 libggadget-1.0-0b libggadget-qt-1.0-0b libgps19 libkdecorations4 libkephal4 libkonq4 libkonqsidebarplugin4a libkscreensaver5 libksgrd4 libksignalplotter4 libkunitconversion4 libkwineffects1a libmarblewidget4 libntrack-qt4-1 libntrack0 libplasma-geolocation-interface4 libplasmaclock4a libplasmagenericshell4 libprocesscore4a libprocessui4a libqalculate5 libqedje0a libqtruby4shared2 libqzion0a libruby1.8 libscim8c2a libsmokekdecore4-3 libsmokekdeui4-3 libsmokekfile3 libsmokekhtml3 libsmokekio3 libsmokeknewstuff2-3 libsmokeknewstuff3-3 libsmokekparts3 libsmokektexteditor3 libsmokekutils3 libsmokenepomuk3 libsmokephonon3 libsmokeplasma3 libsmokeqtcore4-3 libsmokeqtdbus4-3 libsmokeqtgui4-3 libsmokeqtnetwork4-3 libsmokeqtopengl4-3 libsmokeqtscript4-3 libsmokeqtsql4-3 libsmokeqtsvg4-3 libsmokeqttest4-3 libsmokeqtuitools4-3 libsmokeqtwebkit4-3 libsmokeqtxml4-3 libsmokesolid3 libsmokesoprano3 libtaskmanager4a libtidy-0.99-0 libweather-ion4a libxklavier16 libxxf86misc1 okteta oxygencursors plasma-dataengines-addons plasma-scriptengine-superkaramba plasma-widget-lancelot plasma-widgets-addons plasma-widgets-workspace polkit-kde-1 ruby1.8 systemsettings update-notifier-common
Running apt-get autoremove made the results using apt-get and aptitude a bit more similar, but there are still quite a lott of differences. I have no idea what packages should be installed after the upgrade, but hope those that do can have a look.

20 November 2010

Petter Reinholdtsen: Lenny->Squeeze upgrades, apt vs aptitude with the Gnome and KDE desktop

I'm still running upgrade testing of the Lenny Gnome and KDE Desktop, but have not had time to spend on reporting the status. Here is a short update based on a test I ran 20101118. I still do not know what a correct migration should look like, so I report any differences between apt and aptitude and hope someone else can see if anything should be changed. This is for Gnome: Installed using apt-get, missing with aptitude
apache2.2-bin aptdaemon at-spi baobab binfmt-support browser-plugin-gnash cheese-common cli-common cpp-4.3 cups-pk-helper dmz-cursor-theme empathy empathy-common finger freedesktop-sound-theme freeglut3 gconf-defaults-service gdm-themes gedit-plugins geoclue geoclue-hostip geoclue-localnet geoclue-manual geoclue-yahoo gnash gnash-common gnome gnome-backgrounds gnome-cards-data gnome-codec-install gnome-core gnome-desktop-environment gnome-disk-utility gnome-screenshot gnome-search-tool gnome-session-canberra gnome-spell gnome-system-log gnome-themes-extras gnome-themes-more gnome-user-share gs-common gstreamer0.10-fluendo-mp3 gstreamer0.10-tools gtk2-engines gtk2-engines-pixbuf gtk2-engines-smooth hal-info hamster-applet libapache2-mod-dnssd libapr1 libaprutil1 libaprutil1-dbd-sqlite3 libaprutil1-ldap libart2.0-cil libatspi1.0-0 libboost-date-time1.42.0 libboost-python1.42.0 libboost-thread1.42.0 libchamplain-0.4-0 libchamplain-gtk-0.4-0 libcheese-gtk18 libclutter-gtk-0.10-0 libcryptui0 libcupsys2 libdiscid0 libeel2-data libelf1 libepc-1.0-2 libepc-common libepc-ui-1.0-2 libfreerdp-plugins-standard libfreerdp0 libgail-common libgconf2.0-cil libgdata-common libgdata7 libgdl-1-common libgdu-gtk0 libgee2 libgeoclue0 libgexiv2-0 libgif4 libglade2.0-cil libglib2.0-cil libgmime2.4-cil libgnome-vfs2.0-cil libgnome2.24-cil libgnomepanel2.24-cil libgnomeprint2.2-data libgnomeprintui2.2-common libgnomevfs2-bin libgpod-common libgpod4 libgtk2.0-cil libgtkglext1 libgtksourceview-common libgtksourceview2.0-common libmono-addins-gui0.2-cil libmono-addins0.2-cil libmono-cairo2.0-cil libmono-corlib2.0-cil libmono-i18n-west2.0-cil libmono-posix2.0-cil libmono-security2.0-cil libmono-sharpzip2.84-cil libmono-system2.0-cil libmtp8 libmusicbrainz3-6 libndesk-dbus-glib1.0-cil libndesk-dbus1.0-cil libopal3.6.8 libpolkit-gtk-1-0 libpt-1.10.10-plugins-alsa libpt-1.10.10-plugins-v4l libpt2.6.7 libpython2.6 librpm1 librpmio1 libsdl1.2debian libservlet2.4-java libsrtp0 libssh-4 libtelepathy-farsight0 libtelepathy-glib0 libtidy-0.99-0 libxalan2-java libxerces2-java media-player-info mesa-utils mono-2.0-gac mono-gac mono-runtime nautilus-sendto nautilus-sendto-empathy openoffice.org-writer2latex openssl-blacklist p7zip p7zip-full pkg-config python-4suite-xml python-aptdaemon python-aptdaemon-gtk python-axiom python-beautifulsoup python-bugbuddy python-clientform python-coherence python-configobj python-crypto python-cupshelpers python-cupsutils python-eggtrayicon python-elementtree python-epsilon python-evolution python-feedparser python-gdata python-gdbm python-gst0.10 python-gtkglext1 python-gtkmozembed python-gtksourceview2 python-httplib2 python-louie python-mako python-markupsafe python-mechanize python-nevow python-notify python-opengl python-openssl python-pam python-pkg-resources python-pyasn1 python-pysqlite2 python-rdflib python-serial python-tagpy python-twisted-bin python-twisted-conch python-twisted-core python-twisted-web python-utidylib python-webkit python-xdg python-zope.interface remmina remmina-plugin-data remmina-plugin-rdp remmina-plugin-vnc rhythmbox-plugin-cdrecorder rhythmbox-plugins rpm-common rpm2cpio seahorse-plugins shotwell software-center svgalibg1 system-config-printer-udev telepathy-gabble telepathy-mission-control-5 telepathy-salut tomboy totem totem-coherence totem-mozilla totem-plugins transmission-common xdg-user-dirs xdg-user-dirs-gtk xserver-xephyr zip
Installed using apt-get, removed with aptitude
arj bluez-utils cheese dhcdbd djvulibre-desktop ekiga eog epiphany-extensions epiphany-gecko evolution-exchange fast-user-switch-applet file-roller gcalctool gconf-editor gdm gedit gedit-common gnome-app-install gnome-games gnome-games-data gnome-nettool gnome-system-tools gnome-themes gnome-utils gnome-vfs-obexftp gnome-volume-manager gnuchess gucharmap guile-1.8-libs hal libavahi-compat-libdnssd1 libavahi-core5 libavahi-ui0 libbind9-50 libbluetooth2 libcamel1.2-11 libcdio7 libcucul0 libcurl3 libdirectfb-1.0-0 libdmx1 libdvdread3 libedata-cal1.2-6 libedataserver1.2-9 libeel2-2.20 libepc-1.0-1 libepc-ui-1.0-1 libexchange-storage1.2-3 libfaad0 libgadu3 libgalago3 libgd2-noxpm libgda3-3 libgda3-common libggz2 libggzcore9 libggzmod4 libgksu1.2-0 libgksuui1.0-1 libgmyth0 libgnome-desktop-2 libgnome-pilot2 libgnomecups1.0-1 libgnomeprint2.2-0 libgnomeprintui2.2-0 libgpod3 libgraphviz4 libgtk-vnc-1.0-0 libgtkhtml2-0 libgtksourceview1.0-0 libgtksourceview2.0-0 libgucharmap6 libhesiod0 libicu38 libisccc50 libisccfg50 libiw29 libjaxp1.3-java-gcj libkpathsea4 liblircclient0 libltdl3 liblwres50 libmagick++10 libmagick10 libmalaga7 libmozjs1d libmpfr1ldbl libmtp7 libmysqlclient15off libnautilus-burn4 libneon27 libnm-glib0 libnm-util0 libopal-2.2 libosp5 libparted1.8-10 libpisock9 libpisync1 libpoppler-glib3 libpoppler3 libpt-1.10.10 libraw1394-8 libsdl1.2debian-alsa libsensors3 libsexy2 libsmbios2 libsoup2.2-8 libspeexdsp1 libssh2-1 libsuitesparse-3.1.0 libsvga1 libswfdec-0.6-90 libtalloc1 libtotem-plparser10 libtrackerclient0 libvoikko1 libxalan2-java-gcj libxerces2-java-gcj libxklavier12 libxtrap6 libxxf86misc1 libzephyr3 mysql-common rhythmbox seahorse sound-juicer swfdec-gnome system-config-printer totem-common totem-gstreamer transmission-gtk vinagre vino w3c-dtd-xhtml wodim
Installed using aptitude, missing with apt-get
gstreamer0.10-gnomevfs
Installed using aptitude, removed with apt-get
[nothing]
This is for KDE: Installed using apt-get, missing with aptitude
autopoint bomber bovo cantor cantor-backend-kalgebra cpp-4.3 dcoprss edict espeak espeak-data eyesapplet fifteenapplet finger gettext ghostscript-x git gnome-audio gnugo granatier gs-common gstreamer0.10-pulseaudio indi kaddressbook-plugins kalgebra kalzium-data kanjidic kapman kate-plugins kblocks kbreakout kbstate kde-icons-mono kdeaccessibility kdeaddons-kfile-plugins kdeadmin-kfile-plugins kdeartwork-misc kdeartwork-theme-window kdeedu kdeedu-data kdeedu-kvtml-data kdegames kdegames-card-data kdegames-mahjongg-data kdegraphics-kfile-plugins kdelirc kdemultimedia-kfile-plugins kdenetwork-kfile-plugins kdepim-kfile-plugins kdepim-kio-plugins kdessh kdetoys kdewebdev kdiamond kdnssd kfilereplace kfourinline kgeography-data kigo killbots kiriki klettres-data kmoon kmrml knewsticker-scripts kollision kpf krosspython ksirk ksmserver ksquares kstars-data ksudoku kubrick kweather libasound2-plugins libboost-python1.42.0 libcfitsio3 libconvert-binhex-perl libcrypt-ssleay-perl libdb4.6++ libdjvulibre-text libdotconf1.0 liberror-perl libespeak1 libfinance-quote-perl libgail-common libgsl0ldbl libhtml-parser-perl libhtml-tableextract-perl libhtml-tagset-perl libhtml-tree-perl libio-stringy-perl libkdeedu4 libkdegames5 libkiten4 libkpathsea5 libkrossui4 libmailtools-perl libmime-tools-perl libnews-nntpclient-perl libopenbabel3 libportaudio2 libpulse-browse0 libservlet2.4-java libspeechd2 libtiff-tools libtimedate-perl libunistring0 liburi-perl libwww-perl libxalan2-java libxerces2-java lirc luatex marble networkstatus noatun-plugins openoffice.org-writer2latex palapeli palapeli-data parley parley-data poster psutils pulseaudio pulseaudio-esound-compat pulseaudio-module-x11 pulseaudio-utils quanta-data rocs rsync speech-dispatcher step svgalibg1 texlive-binaries texlive-luatex ttf-sazanami-gothic
Installed using apt-get, removed with aptitude
amor artsbuilder atlantik atlantikdesigner blinken bluez-utils cvs dhcdbd djvulibre-desktop imlib-base imlib11 kalzium kanagram kandy kasteroids katomic kbackgammon kbattleship kblackbox kbounce kbruch kcron kdat kdemultimedia-kappfinder-data kdeprint kdict kdvi kedit keduca kenolaba kfax kfaxview kfouleggs kgeography kghostview kgoldrunner khangman khexedit kiconedit kig kimagemapeditor kitchensync kiten kjumpingcube klatin klettres klickety klines klinkstatus kmag kmahjongg kmailcvt kmenuedit kmid kmilo kmines kmousetool kmouth kmplot knetwalk kodo kolf kommander konquest kooka kpager kpat kpdf kpercentage kpilot kpoker kpovmodeler krec kregexpeditor kreversi ksame ksayit kshisen ksig ksim ksirc ksirtet ksmiletris ksnake ksokoban kspaceduel kstars ksvg ksysv kteatime ktip ktnef ktouch ktron kttsd ktuberling kturtle ktux kuickshow kverbos kview kviewshell kvoctrain kwifimanager kwin kwin4 kwordquiz kworldclock kxsldbg libakode2 libarts1-akode libarts1-audiofile libarts1-mpeglib libarts1-xine libavahi-compat-libdnssd1 libavahi-core5 libavc1394-0 libbind9-50 libbluetooth2 libboost-python1.34.1 libcucul0 libcurl3 libcvsservice0 libdirectfb-1.0-0 libdjvulibre21 libdvdread3 libfaad0 libfreebob0 libgd2-noxpm libgraphviz4 libgsmme1c2a libgtkhtml2-0 libicu38 libiec61883-0 libindex0 libisccc50 libisccfg50 libiw29 libjaxp1.3-java-gcj libk3b3 libkcal2b libkcddb1 libkdeedu3 libkdegames1 libkdepim1a libkgantt0 libkleopatra1 libkmime2 libkpathsea4 libkpimexchange1 libkpimidentities1 libkscan1 libksieve0 libktnef1 liblockdev1 libltdl3 liblwres50 libmagick10 libmimelib1c2a libmodplug0c2 libmozjs1d libmpcdec3 libmpfr1ldbl libneon27 libnm-util0 libopensync0 libpisock9 libpoppler-glib3 libpoppler-qt2 libpoppler3 libraw1394-8 librss1 libsensors3 libsmbios2 libssh2-1 libsuitesparse-3.1.0 libswfdec-0.6-90 libtalloc1 libxalan2-java-gcj libxerces2-java-gcj libxtrap6 lskat mpeglib network-manager-kde noatun pmount tex-common texlive-base texlive-common texlive-doc-base texlive-fonts-recommended tidy ttf-dustin ttf-kochi-gothic ttf-sjfonts
Installed using aptitude, missing with apt-get
dolphin kde-core kde-plasma-desktop kde-standard kde-window-manager kdeartwork kdebase kdebase-apps kdebase-workspace kdebase-workspace-bin kdebase-workspace-data kdeutils kscreensaver kscreensaver-xsavers libgle3 libkonq5 libkonq5-templates libnetpbm10 netpbm plasma-widget-folderview plasma-widget-networkmanagement xscreensaver-data-extra xscreensaver-gl xscreensaver-gl-extra xscreensaver-screensaver-bsod
Installed using aptitude, removed with apt-get
kdebase-bin konq-plugins konqueror

13 March 2010

Lucas Nussbaum: RC bugs of the week

I just couldn t resist I joined the game, but did it the other way around. I could only file 51 new FTBFS (Fail To Build From Source) bugs this time. Looks like Squeeze is getting closer! I ve also been doing rebuilds of Ubuntu lucid. There are currently 561 packages that fail to build from source in lucid/amd64, versus 430 in sid/amd64 (I will start rebuilding squeeze instead of sid after the freeze). Surprisingly, only 131 packages fail in both. I would have expected that number to be much higher. The 51 new FTBFS bugs:
#573648: gnome-chemistry-utils: FTBFS: Nonexistent build-dependency: libgoffice-0-8-dev
#573649: /api/package-list is no longer compressed
#573650: /api/package-list is no longer compressed
#573651: virt-top: FTBFS: configure: error: Cannot find required OCaml package extlib
#573652: heartbeat: FTBFS: Nonexistent build-dependency: libcluster-glue-dev
#573653: abiword: FTBFS: Nonexistent build-dependency: libgoffice-0-8-dev
#573654: helium: FTBFS: Makefile: hGetLine: invalid argument (Invalid or incomplete multibyte or wide character)
#573655: mlton-cross: FTBFS: /bin/sh: wget: not found
#573656: pytest-xdist: FTBFS: ImportError: No module named setuptools
#573657: libfile-fu-perl: FTBFS: tests failed
#573658: libphysfs: FTBFS: docs/man/man3/PHYSFS_addToSearchPath.3: No such file or directory at /usr/bin/dh_installman line 127.
#573659: ecl: FTBFS: rm: cannot remove /build/user-ecl_10.2.1-1-amd64-S2bazb/ecl-10.2.1/debian/ecl/usr/share/info/dir : No such file or directory
#573660: /api/package-list is no longer compressed
#573661: libdbix-class-schema-loader-perl: FTBFS: tests failed
#573662: /api/package-list is no longer compressed
#573663: libthai: FTBFS: /usr/bin/install: cannot stat ./../doc/man/man3/th_render_text_tis.3 : No such file or directory
#573664: /api/package-list is no longer compressed
#573665: hunspell-dict-ko: FTBFS: build hangs
#573666: plexus-active-collections: FTBFS: missing junit:junit:jar:debian
#573667: nuapplet: FTBFS: Can t find gnutls library developpement files!
#573668: binutils-z80: FTBFS: /bin/sh: cannot open /build/user-binutils-z80_2.20-3-amd64-MwJBIl/binutils-z80-2.20/binutils-2.20.tar.bz2: No such file
#573669: keynav: FTBFS: keynav.c:799: error: too few arguments to function xdo_mousemove
#573670: moblin-panel-applications: FTBFS: moblin-netbook-launcher.c:1640: undefined reference to mx_scroll_view_get_vscroll_bar
#573671: tetradraw: FTBFS: /bin/bash: line 1: automake-1.7: command not found
#573672: beid: FTBFS: rm: cannot remove _src/eidmw/bin/eidmw_*.qm : No such file or directory
#573673: swfdec-gnome: FTBFS: Nonexistent build-dependency: libswfdec-0.8-dev
#573674: swfdec-mozilla: FTBFS: Nonexistent build-dependency: libswfdec-0.8-dev
#573675: jasmin-sable: FTBFS: Error: JAVA_HOME is not defined correctly.
#573676: corosync: FTBFS: Depends field, reference to libcorosync4 : error in version: version string is empty
#573677: banshee-extension-mirage: FTBFS: ./PlaylistGeneratorSource.cs(469,39): error CS0539: Banshee.PlaybackController.IBasicPlaybackController.Next in explicit interface declaration is not a member of interface
#573678: gnucash: FTBFS: Nonexistent build-dependency: libgoffice-0-8-dev
#573679: libwx-perl: FTBFS: xvfb-run: error: Xvfb failed to start
#573680: /api/package-list is no longer compressed
#573681: fso-usaged: FTBFS: fsobasics-2.0.vapi:110.2-110.84: error: FsoFramework already contains a definition for AsyncWorkerQueue
#573682: libiscwt-java: FTBFS: Nonexistent build-dependency: libswt-gtk-3.4-java
#573683: nordugrid-arc-nox: FTBFS: ld: cannot find -larccrypto
#573684: cssc: FTBFS: rm: cannot remove /build/user-cssc_1.2.0-1-amd64-XCK7aQ/cssc-1.2.0/debian/cssc/usr/share/info/dir* : No such file or directory
#573685: django-threaded-multihost: FTBFS: distutils.errors.DistutilsError: Could not find suitable distribution for Requirement.parse( setuptools-hg )
#573686: /api/package-list is no longer compressed
#573687: davical: FTBFS: /bin/sh: phpdoc: not found
#573688: gauche-gtk: FTBFS: gauche-gtk.c:450: error: too few arguments to function Scm_Apply
#573689: quilt: FTBFS: tests failed
#573690: pyabiword: FTBFS: Nonexistent build-dependency: libgoffice-0-8-dev
#573691: flumotion: FTBFS: configure: error: You need at least version 2.0.1 of Twisted
#573692: libnet-dns-zone-parser-perl: FTBFS: tests failed
#573693: nip2: FTBFS: Nonexistent build-dependency: libgoffice-0-8-dev
#573694: hedgewars: FTBFS: Error: Illegal parameter: -Nu
#573695: epsilon: FTBFS: FAILED (skips=5, expectedFailures=1, errors=7, successes=229)
#573696: python-glpk: FTBFS: Unsatisfiable build-dependency: libglpk-dev(inst 4.43-1 ! <= wanted 4.38.999)
#573697: libnanoxml2-java: FTBFS: cp: cannot stat /usr/share/doc/default-jdk-doc/api/package-list.gz : No such file or directory
#573698: doxia-maven-plugin: FTBFS: Reason: Cannot find parent: org.apache.maven.doxia:doxia

3 December 2008

David Bremner: Using GLPK from C++

Recently I suggested to some students that they could use the Gnu Linear Programming Toolkit from C++. Shortly afterwards I thought I had better verify that I had not just sent people on a hopeless mission. To test things out, I decided to try using GLPK as part of an ongoing project with Lars Schewe The basic idea of this example is to use glpk to solve an integer program with row generation. The main hurdle (assuming you want to actually write object oriented c++) is how to make the glpk callback work in an object oriented way. Luckily glpk provides a pointer "info" that can be passed to the solver, and which is passed back to the callback routine. This can be used to keep track of what object is involved. Here is the class header
#ifndef GLPSOL_HH
#define GLPSOL_HH
#include "LP.hh"
#include "Vektor.hh"
#include "glpk.h"
#include "combinat.hh"
namespace mpc  
  class  GLPSol : public LP  
  private:
    glp_iocp parm;
    static Vektor<double> get_primal_sol(glp_prob *prob);
    static void callback(glp_tree *tree, void *info);
    static int output_handler(void *info, const char *s);
  protected:
    glp_prob *root;
  public:
    GLPSol(int columns);
    ~GLPSol()  ;
    virtual void rowgen(const Vektor<double> &candidate)  ;
    bool solve();
    bool add(const LinearConstraint &cnst);
   ;
 
#endif
The class LP is just an abstract base class (like an interface for java-heads) defining the add method. The method rowgen is virtual because it is intended to be overridden by a subclass if row generation is actually required. By default it does nothing. Notice that the callback method here is static; that means it is essentially a C function with a funny name. This will be the function that glpk calls when it wants from help.
#include <assert.h>
#include "GLPSol.hh"
#include "debug.hh"
namespace mpc 
  GLPSol::GLPSol(int columns)  
    // redirect logging to my handler
    glp_term_hook(output_handler,NULL);
    // make an LP problem
    root=glp_create_prob();
    glp_add_cols(root,columns);
    // all of my variables are binary, my objective function is always the same
    //  your milage may vary
    for (int j=1; j<=columns; j++) 
      glp_set_obj_coef(root,j,1.0);
      glp_set_col_kind(root,j,GLP_BV);
     
    glp_init_iocp(&parm);
    // here is the interesting bit; we pass the address of the current object
    // into glpk along with the callback function
    parm.cb_func=GLPSol::callback;
    parm.cb_info=this;
   
  int GLPSol::output_handler(void *info, const char *s) 
    DEBUG(1) << s;
    return 1;
   
  Vektor<double> GLPSol::get_primal_sol(glp_prob *prob) 
    Vektor<double> sol;
    assert(prob);
    for (int i=1; i<=glp_get_num_cols(prob); i++) 
      sol[i]=glp_get_col_prim(prob,i);
     
    return sol;
   
  // the callback function just figures out what object called glpk and forwards
  // the call. I happen to decode the solution into a more convenient form, but 
  // you can do what you like
  void GLPSol::callback(glp_tree *tree, void *info) 
    GLPSol *obj=(GLPSol *)info;
    assert(obj);
    switch(glp_ios_reason(tree)) 
    case GLP_IROWGEN:
      obj->rowgen(get_primal_sol(glp_ios_get_prob(tree)));
      break;
    default:
      break;
     
   
  bool GLPSol::solve(void)    
    int ret=glp_simplex(root,NULL);
    if (ret==0) 
      ret=glp_intopt(root,&parm);
    if (ret==0)
      return (glp_mip_status(root)==GLP_OPT);
    else
      return false;
   
  bool GLPSol::add(const LinearConstraint&cnst) 
    int next_row=glp_add_rows(root,1);
    // for mysterious reasons, glpk wants to index from 1
    int indices[cnst.size()+1];
    double coeff[cnst.size()+1];
    DEBUG(3) << "adding " << cnst << std::endl;
    int j=1;
    for (LinearConstraint::const_iterator p=cnst.begin();
         p!=cnst.end(); p++) 
      indices[j]=p->first;
      coeff[j]=(double)p->second;
      j++;
     
    int gtype=0;
    switch(cnst.type()) 
    case LIN_LEQ:
      gtype=GLP_UP;
      break;
    case LIN_GEQ:
      gtype=GLP_LO;
      break;
    default:
      gtype=GLP_FX;
     
    glp_set_row_bnds(root,next_row,gtype,       
                       (double)cnst.rhs(),(double)cnst.rhs());
    glp_set_mat_row(root,
                    next_row,
                    cnst.size(),
                    indices,
                    coeff);
    return true;
   
 
All this is a big waste of effort unless we actually do some row generation. I'm not especially proud of the crude rounding I do here, but it shows how to do it, and it does, eventually solve problems.
#include "OMGLPSol.hh"
#include "DualGraph.hh"
#include "CutIterator.hh"
#include "IntSet.hh"
namespace mpc 
  void OMGLPSol::rowgen(const Vektor<double>&candidate) 
    if (diameter<=0) 
      DEBUG(1) << "no path constraints to generate" << std::endl;
      return;
     
    DEBUG(3) << "Generating paths for " << candidate << std::endl;
  // this looks like a crude hack, which it is, but motivated by the
  // following: the boundary complex is determined only by the signs
  // of the bases, which we here represent as 0 for - and 1 for +
    Chirotope chi(*this);
    for (Vektor<double>::const_iterator p=candidate.begin();
         p!=candidate.end(); p++) 
      if (p->second > 0.5)  
        chi[p->first]=SIGN_POS;
        else  
        chi[p->first]=SIGN_NEG;
       
     
    BoundaryComplex bc(chi);
    DEBUG(3) << chi;
    DualGraph dg(bc);
    CutIterator pathins(*this,candidate);
    int paths_found=
      dg.all_paths(pathins,
                   IntSet::lex_set(elements(),rank()-1,source_facet),
                   IntSet::lex_set(elements(),rank()-1,sink_facet),
                   diameter-1);
    DEBUG(1) << "row generation found " << paths_found << " realized paths\n";
    DEBUG(1) << "effective cuts: " << pathins.effective() << std::endl;
   
  void OMGLPSol::get_solution(Chirotope &chi)  
    int nv=glp_get_num_cols(root);
    for(int i=1;i<=nv;++i)  
      int val=glp_mip_col_val(root,i);
      chi[i]=(val==0 ? SIGN_NEG : SIGN_POS);
     
   
 
So ignore the problem specific way I generate constraints, the key remaining piece of code is CutIterator which filters the generated constraints to make sure they actually cut off the candidate solution. This is crucial, because row generation must not add constraints in the case that it cannot improve the solution, because glpk assumes that if the user is generating cuts, the solver doesn't have to.
#ifndef PATH_CONSTRAINT_ITERATOR_HH
#define PATH_CONSTRAINT_ITERATOR_HH
#include "PathConstraint.hh"
#include "CNF.hh"
namespace mpc  
  class CutIterator : public std::iterator<std::output_iterator_tag,
                                                      void,
                                                      void,
                                                      void,
                                                      void> 
  private:
    LP& _list;
    Vektor<double> _sol;
    std::size_t _pcount;
    std::size_t _ccount;
  public:
    CutIterator (LP& list, const Vektor<double>& sol) : _list(list),_sol(sol), _pcount(0), _ccount(0)  
    CutIterator& operator=(const Path& p)  
      PathConstraint pc(p);
      _ccount+=pc.appendTo(_list,&_sol);
      _pcount++;
      if (_pcount %10000==0)  
        DEBUG(1) << _pcount << " paths generated" << std::endl;
       
      return *this;
     
    CutIterator& operator*()  return *this; 
    CutIterator& operator++()  return *this; 
    CutIterator& operator++(int)  return *this; 
    int effective()   return _ccount;  ;
   ;
 
#endif
Oh heck, another level of detail; the actual filtering actually happens in the appendTo method the PathConstraint class. This is just computing the dot product of two vectors. I would leave it as an exercise to the readier, but remember some fuzz is neccesary to to these kinds of comparisons with floating point numbers. Eventually, the decision is made by the following feasible method of the LinearConstraint class.
 bool feasible(const
Vektor<double> & x)  double sum=0; for (const_iterator
p=begin();p!=end(); p++)  sum+= p->second*x.at(p->first);  
      switch (type()) 
      case LIN_LEQ:
        return (sum <= _rhs+epsilon);
      case LIN_GEQ:
        return (sum >= _rhs-epsilon);
    default:
      return (sum <= _rhs+epsilon) &&
        (sum >= _rhs-epsilon);