Search Results: "Cameron Dale"

7 July 2008

Cameron Dale: blog software updated

I updated ikiwiki, the software used to generate this blog, to the version in backports (2.50 from 1.50). I tried very hard to not spam any aggregators, as the new version uses a different link format. Hopefully I succeeded, apologies if I didn't.

22 June 2008

Cameron Dale: apt-p2p 0.1.5: IMPORTANT update to expunge bittorrent nodes

I have just uploaded an important update to apt-p2p that is highly recommended for all users. At some point over the last 48 hours bittorrent nodes have started to infiltrate the apt-p2p DHT (you may have noticed many ValueError tracebacks in your log file). The (mainline) bittorrent DHT is very similar in protocol to apt-p2p, and so the nodes are able to partially communicate and pollute each others routing table. I didn't think this would ever happen, as there wouldn't seem to be a reason for them to ever come into contact, but somehow it did. I've made some minor changes that exploit the differences between the two protocols to exclude bittorrent nodes from the apt-p2p routing table, and to drop any requests from bittorrent nodes (which should prevent apt-p2p nodes from polluting the bittorrent DHT). However, it is very important that all users upgrade to this new version to prevent any further mixing of the DHTs.

20 June 2008

Cameron Dale: DebTorrent 0.1.7: Long-Lived Torrents for Testing/Unstable

This new version of DebTorrent finalizes the work I previously uploaded to experimental. This new version is the first in unstable to use unique piece numbers to create long-lived torrents for testing and unstable. It also comes with a new version of the apt-transport-debtorrent package, which can now tell APT not to cache any package files downloaded with DebTorrent, which should save some disk space for some people. The locations of the long-lived torrent files and piece files that DebTorrent uses has also changed, as they are now hosted on my merkel page and should be more up to date. However, I was not able to get the ssh trigger for the sync from ftp-master setup, nor am I able to access the projectb database through pygresql. If anyone knows how I can get those working, please email me. Here is the changelog for debtorrent:

Display the torrent identifier on the local status page rather than the info hash (Closes: #465339)
Add support for the new No Content message to apt-transport-debtorrent so that apt treats debtorrent sources as local
Make the pieces and unique piece number locations config options, and use the new ones at merkel.debian.org/~camrdale/.
Decompress the files needed to create torrents so they are created and started faster (this and the previous Closes: #463676)

And for apt-transport-debtorrent:

Add a config option to make debtorrent sources local (Closes: #477383)
Upgrade the recommends on debtorrent to 0.1.7.
Add a config file for APT to load, and a manpage describing the options.

10 May 2008

Cameron Dale: apt-p2p 0.1.3: now faster

I have updated apt-p2p to version 0.1.3, which is available now in unstable and my repository (see my previous post introducing apt-p2p). This update brings mostly speed improvements to finding peers to download from, and performing the download. Nodes in the DHT are now checked for responsiveness more thoroughly before adding them to the routing table, and rechecked more frequently afterwards. This should prevent bad nodes from propagating through the system, and departing nodes remaining in the system long after they have left. Downloading from peers is also quicker, as bad peers are dropped with a quicker timeout value and after a small number of errors. There was also a bug revealed by about 10% of the mirrors that caused downloads to fail, and was fixed in version 0.1.2. This update is STRONGLY recommended, as keeping unresponsive nodes out of the DHT will make everyone's experience better. If you tried apt-p2p and had some problems, I hope you'll consider trying again now. Unfortunately, these are enhancements I could not predict needing before releasing to the public, as the problems they solve are all caused by having a larger number of users, some of which are unresponsive. There are still a large number of peers (maybe 50% of them) that are unreachable, and so can not share any files with other peers. I don't have a good way to check yet, but you can go to sites such as this one to probe your client for you. Just enter the remote IP address and port (available on the status page) of your client in the form http://ipaddress:port/ (e.g. http://1.2.3.4:9977/). If the check returns any HTTP headers (even a 404 Not Found response) then you're fine, but if it doesn't return anything then your peer is firewalled or NATted and should probably be fixed. Here are the changelog entries for 0.1.2 and 0.1.3:

Speed up downloading from peers
- Set a new peer's ranking values so they don't get an unfair advantage.
- Reduce the HTTP connection timeout to 10s.
- Drop peers after a limited number of errors.
Speed up the DHT requests when nodes fail
- Schedule a re-ping message after adding a new node.
- When a node fails, schedule a future ping to check again.
- Send periodic finds to nodes that are stale.
- Increase the stored value redundancy to 6.
- Increase the concurrency of DHT requests to 8.
- Add early termination and ignoring slow responses to recursive DHT actions when timeouts occur.
Remove the debconf note about port forwarding (Closes:#479492)
- Add a NEWS entry for port forwarding
Fixed a bug in the HTTP downloader that caused errors with some mirrors that always close the connections (Closes: #479455)

4 May 2008

Cameron Dale: Introducing apt-p2p: a new P2P apt downloader

After many long months of planning and work, I have completed another peer-to-peer downloader for Debian. If you've been keeping track, that makes 2 now. This one is called apt-p2p, and as of yesterday it is available in unstable. The functionality is very similar to the first one I wrote, DebTorrent, so if you've used that one you should feel very comfortable. After installation you just add a localhost:9977 to the start of your sources.list entries (see the man page). The only difference is the port number (DebTorrent uses 9988 by default) and that you can use it on all your sources.list entries, whether they be deb, deb-src, official Debian archive, or any other archive. Then an apt-get update gets it started, and you can begin installing packages. Point your web browser to http://localhost:9977 to see what's going on. IMPORTANT: as with any P2P program, it works much better if you open a port through your NAT or firewall. Without this crucial step, you won't be able to share with any peers, and your lookup of peers to download from may take longer. Make sure to forward both TCP and UDP ports 9977, or whichever port you set in the config file. For more details, see the port forwarding section of the FAQ. The similarities with DebTorrent are all external, so let's look at how it differs from DebTorrent internally:

it's very general, it doesn't matter what you're trying to download, a source package or a Packages file, from constantly-updated unstable or a year-old stable, for i386 or hppa architectures (DebTorrent only works for .deb files, is only supported for the official archive, and breaks downloaders into groups by architectures)
it doesn't require anything other than what's available to apt (DebTorrent uses piece hashes of large files, and ordering information, both of which are downloaded separately)
it can be very fast when downloading from mirrors (mirror downloads with DebTorrent are not so fast)
the code is simple, and makes use of available code, such as Twisted, Khashmir, and python-apt, which should make future enhancements and maintenance easier (DebTorrent is large and monolithic)
requires less memory and CPU power (50% to 75% less memory than DebTorrent)

Here's some technical details for those interested:

makes use of hashes to uniquely identify files
uses a Distributed Hash Table (DHT) to find peers
also stores piece hashes in the DHT for efficient downloading
uses HTTP/1.1 requests to download from peers
no peers available causes it to fallback to a download from the mirror

Though I know it can be fast, I'm not yet sure if the peer lookup in the DHT will be quick enough to keep up with the downloading. All my tests so far show that it is, but until there are a number of peers out there trying it, I can't be sure. Also, I have some improvements in mind to enhance the speed, in particular the wait for a timeout to occur, so this may improve in the future. If you see a delay in downloading where apt seems to be stalled saying 'Waiting for headers', for now be patient and see what happens. It may be that apt-p2p is downloading in the background (it does this sometimes), or that it's waiting for a lookup in the DHT to complete. If it hangs for more than a minute, or there's errors in the log file, please file a bug so that I can look into it. Finally, for you DebTorrent fans don't worry, I haven't given up on it. Stay tuned for more info on it coming soon.

28 January 2008

Cameron Dale: DebTorrent 0.1.6: Finally, Unique Piece Numbers

I have uploaded a new version of DebTorrent to experimental and to my personal repository. This new version implements the unique piece numbers, which has been planned for quite some time now. Well, it's finally here. To summarize, unique piece numbers keep torrent's alive longer, by assigning files unique piece numbers that never get reused in that torrent. New files get new piece numbers added to the end, but peers in the old torrent can still share most of the old files with peers in the new torrent (in fact, it's the same torrent, but old and new peers have different ideas about what it contains). The creation of torrents from Packages files has also changed, as now 2 torrents are created, one for whatever arch the Packages file was for, and one for the Architecture:all files. Since the Arch:all files are the same for all architectures, this also allows for more sharing of common files between peers on different architectures. This change required a change in the cache directories, which is described in more detail in the NEWS file, but is handled almost automatically. The only thing to do is the make sure to do an apt-get update after upgrading so that the torrents can be restarted. There was also some changes in the statistics reporting. In the client, the uploaded and downloaded statistics will now persist over restarts so that you can see how much you've done over a longer period. The tracker status page also got an update, and now shows the total uploaded and downloaded bytes for each torrent, as well as some more descriptive names for some of the torrents. Due to the large changes in this release, I opted for uploading to experimental (and my repository) for now, so that it can be tested a bit. Please do test it, and let me know of any bugs, problems or concerns that you may have. Here is the changelog:

Add support for unique piece numbers
- increases duration of oft-updated torrents so that more peers can participate
- currently supported only by debian testing and unstable
- see http://wiki.debian.org/DebTorrent/UniquePieces for more info
Switch to using 2 torrents per Packages file: one for architecture- specific files, and one for architecture-independent files
- also added a new script splitcachefor_all to ease the upgrade
Use python-debian for all reading of RFC 822 type files
- also requires python-apt
Add torrent names to the tracker display
Make the download/upload statistics persist over restarts
Report more and better statistics on the tracker's info page

9 January 2008

Cameron Dale: DebTorrent 0.1.5: Now With Status Updates

It's been a while, but a new update to DebTorrent is available in unstable. There is also a new version of the helper apt-transport-debtorrent (0.2.0) to go with it, which was mostly the cause of the long delay (due to a pending apt transition to testing). It is now highly recommended to install both, read below for new reasons why. One of the previous problems with debtorrent was the incorrect status updates displayed by apt, due to it not being aware of pieces of large files that have already downloaded. Using these new versions together will help to improve the update status messages shown during a download. Now, apt-get update will display a status at the bottom that will look something like:

DebTorrent: 837MiB left at 319 KiB/s (46m03s)

You may not be able to see it all if there are multiple updates to different mirrors under way. In aptitude, the status will appear on a line by itself at the bottom of the list of downloaded files. The status line will update about once per second, and should give a good indication of how much time is really left. The other (incorrect) status information presented by apt/aptitude will still be there but you can ignore it. I've only tested apt and aptitude, so I'm not sure how (or if) synaptic, adept or gnome-apt will display these messages, though they are general apt status messages, usually used for displaying the status while logging into an FTP server. This should also fix another problem that debtorrent can have with downloading large files (usually on slower connections). Since apt can't see the download happening for a long time, the connection would sometimes time out. With these status updates the connection is constantly active, so the timeouts shouldn't be a problem when downloading large packages (although they may still occur when doing an apt-get update, as there are no status updates during that process). I did implement a better solution, as suggested by Micha Politowski on the wiki. This solution uses the creation of sparse files to let apt know how much of a file has actually been downloaded. However, apt doesn't understand how a single process can download multiple files in prarallel, so it only follows the last file that was started, and complains when a file finishes that was not the last one that started. I've disabled that for now while I see if it's possible to modify apt to work better in this situation. Here are the changes for debtorrent:

Update to support apt debtorrent transport version 0.2
- send piece downloaded status messages (currently disbaled due to apt not liking it)
- send general status update messages
Fix some minor packaging issues
Upgrade to standards version 3.7.3 (no changes)
Remove the unneeded binary-arch rule
Changes the XS-Vcs-* headers to Vcs-*
Moved Homepage from description to Source package fields

And for apt-transport-debtorrent:

Upgrade transport version to 0.2
- Create sparse files based on status messages (102) from DebTorrent
- Send general status updates to apt (103) from DebTorrent
Remove support for Range headers as they may confuse the sparse file allocation
Fix typo in long description
Upgrade to standards version 3.7.3 (no changes)
Changes the XS-Vcs-* headers to Vcs-*
Moved Homepage from description to Source package fields

14 October 2007

Cameron Dale: DebTorrent Needs a Sponsor

Since the conclusion of the Google Summer of Code, the DebTorrent project has been without a sponsor. I was going to email debian-mentors to see if anyone was interested, but I decided to first post to my blog to see if there are any Debian Developers reading this who have the time and are especially interested in sponsoring it. Here are some of the details of the package:

Package name: debtorrent
Current version: 0.1.4.1
Author: Cameron Dale camrdale@gmail.com
URL: http://debtorrent.alioth.debian.org/
License: MIT
Section: net
Priority: optional
Language: python
VCS: subversion http://svn.debian.org/wsvn/debtorrent/debtorrent/trunk/

It is currently maintained using an svn-buildpackage subversion repository, but I might be interested in switching to git if a sponsor is interested.

21 September 2007

Cameron Dale: 2Wire Router 2700HG-E Violates HTTP/1.1

I recently upgraded my Internet service with Telus, which included a new wireless router to replace my old high-speed modem. The new router is a 2Wire 2700HG-E which is configured through a web interface. The web interface worked fine at first on some other machines, but whenever I accessed it from my main desktop machine the pages would either fail to load, or have the images in the wrong locations. After much debugging (though it seems obvious now) I finally discovered that this was due to my enabling HTTP pipelining in FireFox on that machine (as suggested by Mozilla). Being the diligent citizen that I am, I of course filed a support request so that this could be fixed in the next version of the 2Wire software. Here is the response I received:

The HomePortal GUI was designed for internet browsers on default settings. We recommend keeping the pipelineing [sic] setting on "false".

Not exactly a good solution, and a little surprising, considering it's relatively straight-forward to implement. I know because I recently upgraded a simple HTTP handler from 1.0 to 1.1 specifically to support pipelining, as part of my work on DebTorrent. Now I've just read "Structured streams: a new transport abstraction", in which the authors state:

implementing pipelining correctly in web servers has proven challenging enough that seven years after the standardization of HTTP/1.1, popular browsers still leave pipelining disabled for compatibility

So it seems pipelining is disabled for a reason, though I had it enabled for years in FireFox and saw no problems (that I know of) until now. Maybe the problems are a thing of the past. Does anyone else have any experiences (good or bad) with pipelining to share?

26 August 2007

Cameron Dale: DebTorrent Usage: details of a successful dist-upgrade

I just completed a very successful and long overdue dist-upgrade of my unstable machine, using mainly DebTorrent for downloading the packages, so I thought I'd post some of my thoughts and experiences. The download consisted of 1294 packages to upgrade, totalling 1350 MB and taking 2h12m to download. Here are some of the good and bad things I observed. Good things There was a single other peer with me in the same torrent, and I managed to download 182 MB from him, which is about 13.5% of the total download. This is the first time I have noticed this downloading from peers occurring, as usually there are too few peers, too many torrents and too many possible packages to download for any sharing to occur. This will hopefully change in the future when more people start using DebTorrent, and when unique piece numbers are introduced to make the torrents last longer. However, it does show how the use of the backup HTTP downloader can seamlessly integrate with downloading from peers to provide a good user experience, even for early adopters. The CPU time used was only 10m46s, which translates to an average CPU usage of 8%, which is very reasonable. The average download speed was 174 KB/s, which is 58% of my maximum download speed. Though this may seem like a bad thing, my goal all along has been to make sure that the download time would not be more than twice as long as using HTTP. Of course, using DebTorrent may never be as fast as a straight HTTP download from a well-provisioned server, but that it not the point. The idea is to reduce the bandwidth needs of hosting a debian archive. But in the future, when there are many peers in a single torrent the download speed may be even faster than using HTTP, especially for peers with very high download rates that could not be matched by a single server. Things that need improvement The completion percentage reported by APT during the download was fairly inaccurate. Here are some sample readings I noted, compared with the actual completion percentage from the DebTorrent status page:

APT Reports	Actual Completion
10%	40%
20%	70%
30%	85%
40%	90%
50%	93%
60%	96%
70%	97.5%
80%	98%
90%	99%

The discrepancy occurs because larger packages are broken up into pieces, so they can be partially downloaded without APT knowing about it (since only fully downloaded packages are passed to APT). Clearly this situation is far from ideal, and can lead to the user feeling that the download is progressing very slowly, or not at all. There are some plans to add status updates to the communication between APT and the DebTorrent client, but they require changes to the APT code to support them, so it may take some time to implement. Another problem is the memory usage I saw during the download, which was approximately 213 MB. This is obviously unnecessarily large, though the metainfo that DebTorrent needs to be aware of is quite large (stored in a text file it is about 3 MB). There does seem to be a memory allocation bug in Python 2.4 which causes increased memory usage, so moving to python 2.5 might help. However, my preliminary tests show this only saves you about 20% for DebTorrent. I think I will have to delve deeper into which parts are using all this memory, and unfortunately python doesn't seem to have a good memory profiler to help with this. I will be looking at both PySizer and Guppy/Heapy to start, but if anyone knows of a better solution, please let me know.

20 August 2007

Cameron Dale: DebTorrent Release 0.1.4 (and apt-transport-debtorrent 0.1.0)

The next release of DebTorrent is now available. This release includes new functionality for communicating with APT using a new transport method specifically designed for debtorrent, and many bug fixes. The major changes in this release are all in the communication between APT and DebTorrent. The HTTP server in DebTorrent that listens for APT requests has been upgraded to support HTTP/1.1 persistent connections and pipelining, which allows APT to have multiple outstanding requests for files. This is useful as DebTorrent suffers from the typical bittorrent slow start, so requesting multiple files at a time helps to speed up the download considerably. Though better, HTTP/1.1 is not ideal for DebTorrent however, as a maximum of 10 outstanding requests is maintained by APT's http method, and files must still be returned in the order they were requested (which is not ideal for bittorrent-type downloading since downloads occur randomly). To further improve the APT communication I have modified APT's http method to create a debtorrent method. This new debtorrent transport for APT is packaged separately as apt-transport-debtorrent, and once installed APT can be told to use it by replacing "http://" with "debtorrent://" in your sources.list file. This method sends all requests it receives immediately to DebTorrent, and will receive responses from DebTorrent in any order. You can find this new method on the Alioth project, or in my personal repository (only amd64 and i386 versions are currently available). Unfortunately, the story doesn't end here. The APT code responsible for sending requests to the method also limits the maximum number of outstanding requests that it will send to the method to 10, which is not really necessary since all existing methods limit the requests they send out themselves. I have therefore patched the current APT code to increase this limit to 1000 (a one line change), and released this patched version as 0.7.6-0.1. You can find this patched version in my personal repository (again, only for i386 and amd64). I have tested it with the other methods available and it causes no problems, and I hope to get the change included in the regular APT code soon. To sum up:

new DebTorrent over HTTP = fast
new DebTorrent with new apt-transport-debtorrent = faster
new DebTorrent with new apt-transport-debtorrent and a patched APT = fastest

The last DebTorrent version (0.1.3.1) is currently in the NEW queue, and judging by the length of it, will be there for about another week. After DebTorrent is added to the archive, I will be upgrading it to this new version. I also hope to get the new apt-transport-debtorrent package into the NEW queue soon. This brings to an end the Google Summer of Code (which this project was created as a part of), but development of DebTorrent will of course continue (probably a little slower). The next major change will be the addition of unique piece numbers, which is almost complete but needs to be extensively tested. I'd like to thank Anthony Towns, Steve McIntyre, and Michael Vogt for their help over the last 4 months, and also the many others who sent me encouraging emails or engaged in interesting discussions about this project. It's the people who make a project like this a fun and memorable thing to do. Here's the changelog for the new DebTorrent release:

APT communication supports HTTP/1.1 connections, including persistent connections and pipelining
Add support for the new debtorrent APT transport method (see the new apt-transport-debtorrent package)
Make the Packages decompression and torrent creation threaded
Improve the startup initialization of files
Add init and configuration files for the tracker
bug fixes:
- restarts would fail when downloaded files have been modified
- deleting old cached data would fail
- small tracker bug causing exceptions
- prevent enabling files before the initialization is complete
- only connect to unique peers from the tracker that are not already connected
- tracker would return all torrents' peers for every request

Cameron Dale: DebTorrent Release 0.1.4 (and apt-transport-debtorrent 0.1.0)

new DebTorrent over HTTP = fast
new DebTorrent with new apt-transport-debtorrent = faster
new DebTorrent with new apt-transport-debtorrent and a patched APT = fastest

APT communication supports HTTP/1.1 connections, including persistent connections and pipelining
Add support for the new debtorrent APT transport method (see the new apt-transport-debtorrent package)
Make the Packages decompression and torrent creation threaded
Improve the startup initialization of files
Add init and configuration files for the tracker
bug fixes:
- restarts would fail when downloaded files have been modified
- deleting old cached data would fail
- small tracker bug causing exceptions
- prevent enabling files before the initialization is complete
- only connect to unique peers from the tracker that are not already connected
- tracker would return all torrents' peers for every request

11 August 2007

Cameron Dale: DebTorrent BugFix Release 0.1.3.1

Here's a new release of DebTorrent, containing only bug fixes. This release fixes 2 bugs, one minor, and one serious. The serious bug would probably have caused anyone using a recent (0.7) version of APT to experience hangs with APT saying "waiting for headers". If you had this issue and you're using APT 0.7, please update to this new version. If you had this issue with an older version of APT, please report it as a bug. Agian, if you do find any bugs or have any problems, please submit them to the DebTorrent mailing list, or come and find me (camrdale) on IRC in the #debtorrent channel on OFTC. Here's the changelog:

First debian package release (again) (Closes: #428005)
fixed: cached HTTP 404 responses get passed properly to APT
fixed: downloading the same file from a previous torrent works now

7 August 2007

Cameron Dale: DebTorrent Release 0.1.3 (in package form!)

Today I have released the next version of DebTorrent. This is the first release to be available in a binary package (.deb) form, and most of the changes are related to the packaging I still consider the program to be alpha quality, though all the functionality for a beta release is there, it just needs to be tested (so tell your friends). I run the program daily and use it for my apt-based updating, so I'm pretty sure it works, but there are definitely bugs. If you do find one please submit it to the DebTorrent mailing list. The installable binary can be found on the Alioth project, or in my personal repository which can be added to your sources.list like so:

deb http://debian.camrdale.org/ unstable main contrib non-free

Once installed, it will start running automatically, and will restart on bootup, so all that is needed is to modify your sources.list files to point them at DebTorrent by prepending localhost:9988 to the mirror name. For example, the entry above for my personal repository would become:

deb http://localhost:9988/debian.camrdale.org/ unstable main contrib non-free

Here's the changelog:

First debian package release (Closes: #428005)
Cleanup all the configuration options
Add a global config file
Moved all logging to log files
Stopped displaying periodic updates
Added init script and default options

10 July 2007

Cameron Dale: DebTorrent Release 0.1.2

Today I have released the next version of DebTorrent. It's been over a month since the last release, which means there are lots of new features in this one. This is also the first release that I consider actually useable, as it now listens for HTTP requests from APT for packages to download, and feeds the downloaded packages back to APT. It also includes a backup HTTP downloader that will use a Debian mirror to download packages from, only when no peers can be found that have them. This means your download always works, even if you're an early adopter (which I hope you are) and there aren't that many peers available. Finally, the larger packages have now been split into multiple pieces, which makes downloading them much more efficient. Here's the changelog:

Add proxying capability to listen for HTTP requests from APT
Add caching for all files downloaded
Add automatic starting of torrents when Packages files are downloaded
Modify startup to initialize all torrent downloads to download nothing
Add automatic enabling of files to download based on requests from APT
Add a backup HTTP download from a mirror when no peers can be found for a package
Modify torrent creation to break large packages into multiple pieces based on the information from http://merkel.debian.org/~ajt/extrapieces/ (thanks to aj for most of this)
Add download status information available from http://localhost:9988/
Add lots more documentation

I've already started work on the next release, which will include almost no new features, but will be much easier (I hope) to use. It will also be distributed in a .deb binary format for the first time, and (again I hope) be available in the Debian archive. Here are the plans I've come up with for the steps to complete for the next release:

Make a debtorrent daemon script based on btlaunchmany
Make a config file with lots of explanations
Load configuration info from /etc
Save downloads and state to /var/cache
Log all messages to /var/log
Clean up the debug logging, possibly add debug levels
Use bittornado packaging files (debian/)
Add init script

I've never created/packaged a daemon before, so hopefully I haven't made any blunders in those plans.

5 July 2007

Cameron Dale: Simulating DebTorrent

I've been spending some time on the #bittorrent channel on Freenode, to see if I could get any good information from other bittorrent client developers. After some poor initial contact, I did get some good suggestions, as well as some tidbits of information I wasn't aware of. One of the tidbits was that most of the bittorrent client developers frowned on the use of selective downloading for torrents, to the point where some refuse to implement it in their clients (apparently the latest mainline client doesn't even have it, though I haven't checked due to license issues). This concerned me, as the DebTorrent client I'm working on will rely heavily on selective downloading to only download packages to be installed by the user. It's not clear to me that it would be an issue though. I'm sure it will make it more difficult to find a rare package in a large swarm of peers, but that's to be expected, and is nicely solved by using an HTTP download from a backup mirror. This backup downloading could become extreme though, making the bittorrent-like download (peer-to-peer) mostly just an HTTP download (client-server). To see if it will be a problem, I conducted a simulation of different sized swarms to determine the amount of unnecessary HTTP downloading that will occur. First, some assumptions to make the simulation easier:

peers join and download sequentially and one at a time
peers never leave
peers download all the packages for their system at once (i.e. all fresh installs, no upgrades)
peers are all interested in the same version and architecture of packages
there are N total peers, and each can make C connections to other peers

With these assumptions, the simulation becomes quite simple. I used the popcon data to assign packages appropriately to the N peers. The peers download their assigned packages from the C previous peers, if possible, or otherwise by HTTP. I ran this multiple times, varying N and C, and calculating each time how much was downloaded through the debtorrent protocol, and how much through HTTP. debtorrent wasted.png

This graph shows the percentage of the total download that used HTTP from a mirror. The optimal line shows the minimum possible (i.e. all peers are connected), which corresponds to only one copy (the first) of every package being downloaded using HTTP. This verifies my previous thinking, which is that fracturing the download population into multiple small swarms results in lots of inefficiencies, and lots of HTTP downloading. All efforts will need to be taken to keep the downloading populations together. (This is especially difficult for unstable, when a new swarm could be created twice a day due to archive updates.) Swarms of 1,000 peers or more seem to be sufficient to minimize the HTTP downloading. debtorrent unnecessary.png

This graph shows the amount of unnecessary HTTP downloading that occurred, which is just the difference of each line from the optimal one in the previous plot. This shows the danger of selective downloading, as all of this unnecessary HTTP downloading occurs because the swarm is too big to find peers that have the desired package (they do exist). Fortunately, this unnecessary downloading seems to approach a maximum as the swarm size increases. For the large swarm sizes needed to make the downloading efficient, and assuming a reasonable number of connections of 100 (bittorrent defaults to maintaining at least 40, and large swarms usually means a large number of connections), we can expect to be using HTTP for about 4% of the download. This number is surprisingly low due to the popularity distribution of packages (only rare ones are hard to find, but they aren't downloaded very often). I think 4% is manageable in our situation, given that there is a backup method to find the rare packages, though it does show the need to have that backup method available. This is probably more of a problem for regular bittorrent clients, in which there is no backup method.

28 June 2007

Cameron Dale: Making DebTorrent Work With APT

It's been a while since I've given a status update for my Google Summer of Code project to create a BitTorrent proxy for downloading packages using APT, so here it is. I've been working hard on integrating support for APT into the DebTorrent program. I've almost got it working perfectly, now it's just a matter of testing to make sure all is well. The functionality works like this:

DebTorrent listens on a port for HTTP requests
an http://localhost:port/mirror_name/debian/... entry is added to APT's sources.list file (similar to how apt-cacher works for proxying)
an apt-get update will then send HTTP requests to the DebTorrent program
- DebTorrent proxies these requests, downloading files it doesn't have and saving them in a cache before passing them on to APT
- DebTorrent recognizes requests for Packages files, and also uses them to start the torrents for those files
an apt-get install will send an HTTP request to DebTorrent for a package (.deb) file
- rather than getting package files from the mirror, DebTorrent finds the file in one of its running torrents and enables it for download
- the package file is downloaded (either from other peers, or using the backup HTTP downloader that gets it from the mirror if no peers can be found)
- once the download is complete, DebTorrent passes the package file to APT

There are two things I really like about this. One of the best is the backup HTTP downloader. It insures that if you're an early adopter and there are no peers, or if the package you're requesting is rare and can't be found in any connected peers, the download will still occur in a reasonable amount of time (taking no more, or less, mirror bandwidth than if you had just been using APT directly). The other thing I like is that you get the BitTorrent-style peer-to-peer downloading, with simple HTTP proxying thrown in for free. You can run DebTorrent on a single computer on your network, and have the others connect to it to initiate downloads and request packages. I haven't done any serious time testing, but I estimate it currently only takes twice as long to use as a regular APT update and download from a mirror. Most of that slowdown is because it currently only processes a single request at a time from APT, which is not very efficient for BitTorrent systems where downloading from multiple peers is how the highest download speeds are achieved. I have been talking to the APT maintainer, Michael Vogt, about a better way to do this, probably by adding a new APT transport method for DebTorrent (i.e. debtorrent:// instead of http://). This will not only speed up the downloads, but also hopefully provide better feedback to the APT user, as currently it will seem nothing is happening until the download comes in all at once at the end. My work has unfortunately been slowed by other commitments, and bugs. I have two papers to submit to a conference by Monday, but after that I should be back full time on DebTorrent. I also spent a long time tracking down and fixing a bug in the underlying BitTornado code (which lead to much rejoicing at 3am), only to find it was fixed in upstream's CVS. (Doh!)

1 June 2007

Cameron Dale: May-31-2007: HTTP Range Requests

I wasn't sure if some/many/all Debian mirrors responded correctly to HTTP/1.1 range requests for only part of a file, so I ran a test on them (download part of the etch Release file). These type of requests will be useful for downloading missing pieces from packages in the BitTorrent-like downloader that I'm writing. Here are the results of my test on the 221 HTTP mirrors:

Status	Description	Count
200	Full Data	1
206	Partial Data	208
302	Document has moved	2
404	Not Found	2
	Couldn't Connect	8
	Total	221

So only one mirror (debian.csie.nctu.edu.tw) was unable to support the range request and instead returned the entire file. It seems like using range requests will be fine, though I'll have to think about some kind of error checking for that one lonely mirror that's not like the others. There was also one mirror that responded with an old Release file. It seems like debian.ihug.co.nz has not been updated since etch was released.

29 May 2007

Cameron Dale: May-28-2007: Start of Coding / New Release

The start has finally arrived, today is officially the first day of coding for the 2007 Google Summer of Code. To commemorate this occasion, I have prepared the next release (the second so far) of the software package I am writing for it, DebTorrent. You can download it from the Alioth project. Some of the changes in this release are:

Added ability to parse dpkg status for priorities of files to download
Fixed a bug in bittornado that prevented using priorities with pre-allocation
Directories are no longer pre-allocated when they will contain no files

You can read detailed instructions for how to download using it in the README file, but the basic idea for using DebTorrent as part of an update is to:

Update your Packages files with apt-get update
Download packages using DebTorrent and the new "statustodownload" option:
- set it to 1 to download all packages currently installed
- set it to 2 to download only new versions of installed packages
Point Apt to the location of your downloads
apt-get update again, and then install

This is still an alpha release, but at least it has some more functionality than the previous one. Feel free to test it, but keep in mind that at this early stage most of the torrents will be unseeded.

22 May 2007

Cameron Dale: May-22-2007: The Long Weekend of Documentation

I spent most of this past weekend (a long weekend in Canada, thanks to our royal heritage) writing documentation for the DebTorrent code. The original BitTornado code was seriously lacking in this area, having only 670 lines of comments out of almost 20,000 total lines of code, which is 3.5% if you're counting. (In fairness, this may be due in part to it being based on the original bittorrent client.) Since Python is supposed to be known for it's well-documented code, I decided to tackle this shortcoming sooner rather than later. It's also a good opportunity for me to become familiar with all of the code. I'm only about halfway there, but the end result of my weekend's worth of work is this diffstat:

DebTorrent/BT1/Choker.py             99 ++++++++
DebTorrent/BT1/Connecter.py         313 +++++++++++++++++++++++++++
DebTorrent/BT1/Downloader.py        448 ++++++++++++++++++++++++++++++++++++++-
DebTorrent/BT1/Storage.py           418 ++++++++++++++++++++++++++++++++++--
DebTorrent/BT1/__init__.py            9
DebTorrent/BT1/btformats.py          47 ++++
DebTorrent/BTcrypto.py              147 ++++++++++++
DebTorrent/ConfigDir.py             248 +++++++++++++++++++++
DebTorrent/ConnChoice.py             12 -
DebTorrent/CurrentRateMeasure.py     69 +++++-
DebTorrent/__init__.py               14 +
DebTorrent/bencode.py               193 ++++++++++++++++
DebTorrent/bitfield.py               89 +++++++
DebTorrent/clock.py                  53 ++++
DebTorrent/download_bt1.py          380 ++++++++++++++++++++++++++++++++-
DebTorrent/inifile.py                70 ++++--
btcompletedir.py                     25 ++
btcopyannounce.py                    20 +
btdownloadheadless.py               111 +++++++++
btlaunchmany.py                      56 ++++
btmakemetafile.py                    19 +
btreannounce.py                       6
btrename.py                          11
btsethttpseeds.py                     6
btshowmetainfo.py                     6
bttrack.py                            6
setup.py                             20 +
27 files changed, 2809 insertions(+), 86 deletions(-)

I've also started using the epydoc program to automatically generate some documentation web pages for easy browsing of the code. So far it's worked out great! So well that I decided to use some of the new features available in a recent beta release that was not yet in the debian archive. The packaging seemed simple enough, so I created my own package containing the new upstream version, which is now available in my repository.

Next.