Search Results: "willy"

9 February 2021

Kees Cook: security things in Linux v5.8

Previously: v5.7 Linux v5.8 was released in August, 2020. Here s my summary of various security things that caught my attention: arm64 Branch Target Identification
Dave Martin added support for ARMv8.5 s Branch Target Instructions (BTI), which are enabled in userspace at execve() time, and all the time in the kernel (which required manually marking up a lot of non-C code, like assembly and JIT code). With this in place, Jump-Oriented Programming (JOP, where code gadgets are chained together with jumps and calls) is no longer available to the attacker. An attacker s code must make direct function calls. This basically reduces the usable code available to an attacker from every word in the kernel text to only function entries (or jump targets). This is a low granularity forward-edge Control Flow Integrity (CFI) feature, which is important (since it greatly reduces the potential targets that can be used in an attack) and cheap (implemented in hardware). It s a good first step to strong CFI, but (as we ve seen with things like CFG) it isn t usually strong enough to stop a motivated attacker. High granularity CFI (which uses a more specific branch-target characteristic, like function prototypes, to track expected call sites) is not yet a hardware supported feature, but the software version will be coming in the future by way of Clang s CFI implementation. arm64 Shadow Call Stack
Sami Tolvanen landed the kernel implementation of Clang s Shadow Call Stack (SCS), which protects the kernel against Return-Oriented Programming (ROP) attacks (where code gadgets are chained together with returns). This backward-edge CFI protection is implemented by keeping a second dedicated stack pointer register (x18) and keeping a copy of the return addresses stored in a separate shadow stack . In this way, manipulating the regular stack s return addresses will have no effect. (And since a copy of the return address continues to live in the regular stack, no changes are needed for back trace dumps, etc.) It s worth noting that unlike BTI (which is hardware based), this is a software defense that relies on the location of the Shadow Stack (i.e. the value of x18) staying secret, since the memory could be written to directly. Intel s hardware ROP defense (CET) uses a hardware shadow stack that isn t directly writable. ARM s hardware defense against ROP is PAC (which is actually designed as an arbitrary CFI defense it can be used for forward-edge too), but that depends on having ARMv8.3 hardware. The expectation is that SCS will be used until PAC is available. Kernel Concurrency Sanitizer infrastructure added
Marco Elver landed support for the Kernel Concurrency Sanitizer, which is a new debugging infrastructure to find data races in the kernel, via CONFIG_KCSAN. This immediately found real bugs, with some fixes having already landed too. For more details, see the KCSAN documentation. new capabilities
Alexey Budankov added CAP_PERFMON, which is designed to allow access to perf(). The idea is that this capability gives a process access to only read aspects of the running kernel and system. No longer will access be needed through the much more powerful abilities of CAP_SYS_ADMIN, which has many ways to change kernel internals. This allows for a split between controls over the confidentiality (read access via CAP_PERFMON) of the kernel vs control over integrity (write access via CAP_SYS_ADMIN). Alexei Starovoitov added CAP_BPF, which is designed to separate BPF access from the all-powerful CAP_SYS_ADMIN. It is designed to be used in combination with CAP_PERFMON for tracing-like activities and CAP_NET_ADMIN for networking-related activities. For things that could change kernel integrity (i.e. write access), CAP_SYS_ADMIN is still required. network random number generator improvements
Willy Tarreau made the network code s random number generator less predictable. This will further frustrate any attacker s attempts to recover the state of the RNG externally, which might lead to the ability to hijack network sessions (by correctly guessing packet states). fix various kernel address exposures to non-CAP_SYSLOG
I fixed several situations where kernel addresses were still being exposed to unprivileged (i.e. non-CAP_SYSLOG) users, though usually only through odd corner cases. After refactoring how capabilities were being checked for files in /sys and /proc, the kernel modules sections, kprobes, and BPF exposures got fixed. (Though in doing so, I briefly made things much worse before getting it properly fixed. Yikes!) RISCV W^X detection
Following up on his recent work to enable strict kernel memory protections on RISCV, Zong Li has now added support for CONFIG_DEBUG_WX as seen for other architectures. Any writable and executable memory regions in the kernel (which are lovely targets for attackers) will be loudly noted at boot so they can get corrected. execve() refactoring continues
Eric W. Biederman continued working on execve() refactoring, including getting rid of the frequently problematic recursion used to locate binary handlers. I used the opportunity to dust off some old binfmt_script regression tests and get them into the kernel selftests. multiple /proc instances
Alexey Gladkov modernized /proc internals and provided a way to have multiple /proc instances mounted in the same PID namespace. This allows for having multiple views of /proc, with different features enabled. (Including the newly added hidepid=4 and subset=pid mount options.) set_fs() removal continues
Christoph Hellwig, with Eric W. Biederman, Arnd Bergmann, and others, have been diligently working to entirely remove the kernel s set_fs() interface, which has long been a source of security flaws due to weird confusions about which address space the kernel thought it should be accessing. Beyond things like the lower-level per-architecture signal handling code, this has needed to touch various parts of the ELF loader, and networking code too. READ_IMPLIES_EXEC is no more for native 64-bit
The READ_IMPLIES_EXEC flag was a work-around for dealing with the addition of non-executable (NX) memory when x86_64 was introduced. It was designed as a way to mark a memory region as well, since we don t know if this memory region was expected to be executable, we must assume that if we need to read it, we need to be allowed to execute it too . It was designed mostly for stack memory (where trampoline code might live), but it would carry over into all mmap() allocations, which would mean sometimes exposing a large attack surface to an attacker looking to find executable memory. While normally this didn t cause problems on modern systems that correctly marked their ELF sections as NX, there were still some awkward corner-cases. I fixed this by splitting READ_IMPLIES_EXEC from the ELF PT_GNU_STACK marking on x86 and arm/arm64, and declaring that a native 64-bit process would never gain READ_IMPLIES_EXEC on x86_64 and arm64, which matches the behavior of other native 64-bit architectures that correctly didn t ever implement READ_IMPLIES_EXEC in the first place. array index bounds checking continues
As part of the ongoing work to use modern flexible arrays in the kernel, Gustavo A. R. Silva added the flex_array_size() helper (as a cousin to struct_size()). The zero/one-member into flex array conversions continue with over a hundred commits as we slowly get closer to being able to build with -Warray-bounds. scnprintf() replacement continues
Chen Zhou joined Takashi Iwai in continuing to replace potentially unsafe uses of sprintf() with scnprintf(). Fixing all of these will make sure the kernel avoids nasty buffer concatenation surprises. That s it for now! Let me know if there is anything else you think I should mention here. Next up: Linux v5.9.

2021, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 License.
CC BY-SA 4.0

30 March 2017

Shirish Agarwal: The tale of the dancing girl #nsfw

Demonstration of a Lapdance - Wikipedia

Demonstration of a Lapdance Wikipedia

The post will be adult/mature in nature. So those below 18 please excuse. The post is about an anecdote almost 20 years to date, The result its being posted is I had a dinner with a friend to whom I shared this and he thought it would be nice if I shared this hence sharing it. The conversation was about being young and foolish in which I shared the anecdote. The blog post was supposed to be about Aadhar which shocked me both in the way no political discourse happened and the way the public as well as public policy was gamed but that would have to wait for another day. History I left college in 1995. The anecdote/incident probably happened couple of years earlier so probably 1992-1993. At that time, I was in my teens and as a typical teenager I made few friends. One of those friends, who would remain nameless as since we drifted apart, and as I have not take permission from him, taking his name would not be a good idea. Anyway, this gentleman, let s call him Mr. X as an example. Couple of months before, he had bought an open jeep, similar but very different from the jeep being shown below. Open Jeep had become a fashion statement few months back (in those days) as a Salman khan starred movie had that and anybody who had money wanted one just like that.
Illustration of an open jeep, sadly its a military one - wikipedia

Illustration of an open jeep, sadly its a military one wikipedia

Those days we didn t have cell-phones and I had given my land-line phone number to very few friends as in those days, as the land-lines were a finicky instrument. One fine morning, I get a call from my friend telling he is going to come near my place and I should meet him at some xyz place and we would go for a picnic for the whole day and it is possible that we might return next day. As it was holidays and only a fool would throw away a chance to have a ride in open air jeep, I immediately agreed. I shared that my friends had organized a picnic and giving another friend s number (who didn t know anything) got permission and went to meet Mr. X. This was very early morning, around 0600 hrs. . After meeting him, he told that we would be going to Mumbai, take some more friends from there and then move on. In those days, a railway ticket from Shivaji Nagar to V.T. (now C.S.T.) costed INR 30/- . I had been to Mumbai few times before for various technical conferences and knew few cheap places to eat, I knew that going via train, we could go and come back spending at the most INR 150/- and still have some change left-over (today s meal at a roadside/street vendor easily passes that mark). The Journey I shared with him that it will be costly and I don t have any money to cover the fuel expenses and he said he would shoulder the expenses, he just wanted my company for the road. Those days, it was the scenic Old Mumbai-Pune highway and we took plenty of stops to admire that ghats (hills and valleys together). That journey must have taken around 7-8 hours while today by new Expressway, you could do the same thing by 2.5/3 hours. Anyhow, we reached to some swanky hotel in South Mumbai. South Mumbai was not the financial powerhouse that it today is, there was mix of very old buildings and new buildings like the swanky hotel that we had checked in. I have no memory nor any idea whether it was 1 class, 3 class or 5 class and could have cared less as had been tired from the journey. We checked in, I had a long warm water bath and then slept in the king-size bed with curtains drawn. Evening came and we took the jeep and picked up 2-3 of his friends who were from my age or a year or two older and we went to Nariman Point. Seeing the Queen s necklace from Nariman Point at night is a sight in itself. Keeping with the innocence, I was under the impression that we had arrived at our destination, at this our host, Mr. X and his Bombaiya friends had a quiet laugh saying its a young night still. We must have whiled away couple of hours, having chai and throwing rocks in the sea. The Meeting After a while, Mr. X took us to another swanky place. My eyes were out of my sockets as this seemed to be as elitist a place as could be. I saw many White European women in various stages of undress pole-dancing and lap-dancing. I had recently (in those days) come to know the term but was under the impression that it was something that happened in Europe and States only. I had no idea that lap-dancing was older than my birth as according to Wikipedia. So looking back now, I am not surprised that in two decades the concept crossed the oceans. Again, Mr. X being the host, agreed to bear all the costs and all of us had food, drink and a lap-dance from any of the dancers on the floor. As I was young and probably shy (still am) I asked Mr. X s help to pick a girl/woman for me. The woman whom he picked was auburn-aired, was either my age or a year or two older/younger to me. What proceeded next was about 20-30 minutes of totally sexualized erotic experience. While he and all his friends picked girls to go all the way, I was hesitant to let loose. Maybe it was due to my lack of courage or inexperience, maybe it was not in my city so couldn t predict the outcome, maybe was just afraid that reality might mar fantasy, I dunno till date. Although we kissed and necked a lot, I guess that should count for something. The conversation After all my friends had gone to the various rooms, sometime after I excused myself, went to the loo myself, peed a bit, splashed cold water on self, came out and had couple of glasses of water and came back to my seat. The lady came back and I shared that I was not interested in going further and while she was beautiful, I just didn t have the guts. I did ask her if she would give me company though for sometime as I didn t know anyone else at that place. Our conversation was more about her than me as I had more or less an average life upto that moment. There were only three unorthodox things that I had done before meeting her. I had drunk wines of different types, smoked weed and had a Magic Mushrooms experience the year before with another group of friends I had made there. Goa in those days was simply magical in those days but that probably would need its story/own blog post. When I enquired about her, she shared she was from Russia and she rattled off more than half a dozen places around the world where she had been to and this was her second or third stint in Mumbai and she wasn t at all unhappy about the lifestyle and choices she was leading. I had no answer for her as a young penniless college-going student. Her self-confidence and the way she carried herself was impressive, with or without clothes. During course of the conversation she shared a couple of contacts from whom I could get better weed at slightly higher price if I were in Goa. Few months later, those contacts turned out to be true. After sometime, we took all the women and ourselves, around 8-9 people in his jeep (how he negotiated that is beyond me) went to a hygienic Pani puri and Bhel (puffed rice mixed with variety of spices typically tomato, potato, coriander chutney as well as Tamarind Chutney among other things) place and moved them to tears (the spices in bhel and Pani puri did it for them) and this was when we had explicitly asked the bhel-wala guy to make it extremely mild with just a hint of spice in it. Anyways, sometime later, we dropped them at the same place, dropped his friends and came back to the hotel we booked and got drunk again. After-effects Few years later, it came in the newspapers/media that while India had broken out of financial isolation just few years back (1991) and were profiting from it, many countries of the former USSR were going the other way around and hence there was huge human trafficking and immigration that had taken place. This was in-line with what the lady/woman/Miss X had shared with me. The latest trigger The latest trigger happened couple of months back where I learnt of a hero flight attendant saving a girl from human-trafficking. Till date, I am unsure whether she was doing it willingly or putting a brave smile in front of me, because even if she had confided me in any way, I probably would have been too powerless to help her in any-way. I just don t know. Foolishness thy name While my friend took advantage of my innocence and introduced me to a world which otherwise I would probably not know exists, it could have easily have gone some other way as well. While I m still unsure of the choices I made, I was and am happy that I was able to strike a conversation with her and attempt to reach the person therein. Was it the truth or an elaborate fabricated lie to protect myself and herself, this I will never know. Oppression I understand the fact that as a customer or somebody who is taking part in either of those performances or experiences it isn t easy in any way to know/say that whether the performer is doing it wilfully or not as the experiences are in tightly controlled settings.
Filed under: Miscellenous Tagged: #anecdote, #confusion, #elitist, #growing up, #lap dance, #NSFW, #Open Jeep, Mumbai

13 January 2016

Norbert Preining: Ian Buruma: Wages of Guilt

Since moving to Japan, I got more and more interested in history, especially the recent history of the 20th century. The book I just finished, Ian Buruma (Wiki, home page) Wages of Guilt Memories of War in Germany and Japan (Independent, NYRB), has been a revelation for me. As an Austrian living in Japan, I am experiencing the discrepancy between these two countries with respect to their treatment of war legacy practically daily, and many of my blog entries revolve around the topic of Japanese non-reconciliation.
Willy Brandt went down on his knees in the Warsaw ghetto, after a functioning democracy had been established in the Federal Republic of Germany, not before. But Japan, shielded from the evil world, has grown into an Oskar Matzerath: opportunistic, stunted, and haunted by demons, which it tries to ignore by burying them in the sand, like Oskar s drum.
Ian Buruma, Wages of Guilt, Clearing Up the Ruins
Buruma-Wages_of_Guilt The comparison of Germany and Japan with respect to their recent history as laid out in Buruma s book throws a spotlight on various aspects of the psychology of German and Japanese population, while at the same time not falling into the easy trap of explaining everything with difference in the guilt culture. A book of great depth and broad insights everyone having even the slightest interest in these topics should read.
This difference between (West) German and Japanese textbooks is not just a matter of detail; it shows a gap in perception.
Ian Buruma, Wages of Guilt, Romance of the Ruins
Only thinking about giving a halfway full account of this book is something impossible for me. The sheer amount of information, both on the German and Japanese side, is impressive. His incredible background (studies of Chinese literature and Japanese movie!) and long years as journalist, editor, etc, enriches the book with facets normally not available: In particular his knowledge of both the German and Japanese movie history, and the reflection of history in movies, were complete new aspects for me (see my recent post (in Japanese)). The book is comprised of four parts: The first with the chapters War Against the West and Romance of the Ruins; the second with the chapters Auschwitz, Hiroshima, and Nanking; the third with History on Trial, Textbook Resistance, and Memorials, Museums, and Monuments; and the last part with A Normal Country, Two Normal Towns, and Clearing Up the Ruins. Let us look at the chapters in turn: The boook somehow left me with a bleak impression of Japanese post-war times as well as Japanese future. Having read other books about the political ignorance in Japan (Norma Field s In the realm of a dying emperor, or the Chibana history), Buruma s characterization of Japanese politics is striking. He couldn t foresee the recent changes in legislation pushed through by the Abe government actually breaking the constitution, or the rewriting of history currently going on with respect to comfort women and Nanking. But reading his statement about Article Nine of the constitution and looking at the changes in political attitude, I am scared about where Japan is heading to:
The Nanking Massacre, for leftists and many liberals too, is the main symbol of Japanese militarism, supported by the imperial (and imperialist) cult. Which is why it is a keystone of postwar pacifism. Article Nine of the constitution is necessary to avoid another Nanking Massacre. The nationalist right takes the opposite view. To restore the true identity of Japan, the emperor must be reinstated as a religious head of state, and Article Nine must be revised to make Japan a legitimate military power again. For this reason, the Nanking Massacre, or any other example of extreme Japanese aggression, has to be ignored, softened, or denied.
Ian Buruma, Wages of Guilt, Nanking
While there are signs of resistance in the streets of Japan (Okinawa and the Hanako bay, the demonstrations against secrecy law and reversion of the constitution), we are still to see a change influenced by the people in a country ruled and distributed by oligarchs. I don t think there will be another Nanking Massacre in the near future, but Buruma s books shows that we are heading back to a nationalistic regime similar to pre-war times, just covered with a democratic veil to distract critics.
I close with several other quotes from the book that caught my attention: In the preface and introduction:
[ ] mainstream conservatives made a deliberate attempt to distract people s attention from war and politics by concentrating on economic growth.
The curious thing was that much of what attracted Japanese to Germany before the war Prussian authoritarianism, romantic nationalism, pseudo-scientific racialism had lingered in Japan while becoming distinctly unfashionable in Germany.
In Romance of the Ruins:
The point of all this is that Ikeda s promise of riches was the final stage of what came to be known as the reverse course, the turn away from a leftist, pacifist, neutral Japan a Japan that would never again be involved in any wars, that would resist any form of imperialism, that had, in short, turned its back for good on its bloody past. The Double Your Incomes policy was a deliberate ploy to draw public attention away from constitutional issues.
In Hiroshima:
The citizens of Hiroshima were indeed victims, primarily of their own military rulers. But when a local group of peace activists petitioned the city of Hiroshima in 1987 to incorporate the history of Japanese aggression into the Peace Memorial Museum, the request was turned down. The petition for an Aggressors Corner was prompted by junior high school students from Osaka, who had embarrassed Peace Museum officials by asking for an explanation about Japanese responsibility for the war.
The history of the war, or indeed any history, is indeed not what the Hiroshima spirit is about. This is why Auschwitz is the only comparison that is officially condoned. Anything else is too controversial, too much part of the flow of history .
In Nanking, by the governmental pseudo-historian Tanaka:
Unlike in Europe or China, writes Tanaka, you won t find one instance of planned, systematic murder in the entire history of Japan. This is because the Japanese have a different sense of values from the Chinese or the Westerners.
In History on Trial:
In 1950, Becker wrote that few things have done more to hinder true historical self-knowledge in Germany than the war crimes trials. He stuck to this belief. Becker must be taken seriously, for he is not a right-wing apologist for the Nazi past, but an eminent liberal.
There never were any Japanese war crimes trials, nor is there a Japanese Ludwigsburg. This is partly because there was no exact equivalent of the Holocaust. Even though the behavior of Japanese troops was often barbarous, and the psychological consequences of State Shinto and emperor worship were frequently as hysterical as Nazism, Japanese atrocities were part of a military campaign, not a planned genocide of a people that included the country s own citizens. And besides, those aspects of the war that were most revolting and furthest removed from actual combat, such as the medical experiments on human guinea pigs (known as logs ) carried out by Unit 731 in Manchuria, were passed over during the Tokyo trial. The knowledge compiled by the doctors of Unit 731 of freezing experiments, injection of deadly diseases, vivisections, among other things was considered so valuable by the Americans in 1945 that the doctors responsible were allowed to go free in exchange for their data.
Some Japanese have suggested that they should have conducted their own war crimes trials. The historian Hata Ikuhiko thought the Japanese leaders should have been tried according to existing Japanese laws, either in military or in civil courts. The Japanese judges, he believed, might well have been more severe than the Allied tribunal in Tokyo. And the consequences would have been healthier. If found guilty, the spirits of the defendants would not have ended up being enshrined at Yasukuni. The Tokyo trial, he said, purified the crimes of the accused and turned them into martyrs. If they had been tried in domestic courts, there is a good chance the real criminals would have been flushed out.
After it was over, the Nippon Times pointed out the flaws of the trial, but added that the Japanese people must ponder over why it is that there has been such a discrepancy between what they thought and what the rest of the world accepted almost as common knowledge. This is at the root of the tragedy which Japan brought upon herself.
Emperor Hirohito was not Hitler; Hitler was no mere Shrine. But the lethal consequences of the emperor-worshipping system of irresponsibilities did emerge during the Tokyo trial. The savagery of Japanese troops was legitimized, if not driven, by an ideology that did not include a Final Solution but was as racialist as Hider s National Socialism. The Japanese were the Asian Herrenvolk, descended from the gods.
Emperor Hirohito, the shadowy figure who changed after the war from navy uniforms to gray suits, was not personally comparable to Hitler, but his psychological role was remarkably similar.
In fact, MacArthur behaved like a traditional Japanese strongman (and was admired for doing so by many Japanese), using the imperial symbol to enhance his own power. As a result, he hurt the chances of a working Japanese democracy and seriously distorted history. For to keep the emperor in place (he could at least have been made to resign), Hirohito s past had to be freed from any blemish; the symbol had to be, so to speak, cleansed from what had been done in its name.
In Memorials, Museums, and Monuments:
If one disregards, for a moment, the differences in style between Shinto and Christianity, the Yasukuni Shrine, with its relics, its sacred ground, its bronze paeans to noble sacrifice, is not so very different from many European memorials after World War I. By and large, World War II memorials in Europe and the United States (though not the Soviet Union) no longer glorify the sacrifice of the fallen soldier. The sacrificial cult and the romantic elevation of war to a higher spiritual plane no longer seemed appropriate after Auschwitz. The Christian knight, bearing the cross of king and country, was not resurrected. But in Japan, where the war was still truly a war (not a Holocaust), and the symbolism still redolent of religious exultation, such shrines as Yasukuni still carry the torch of nineteenth-century nationalism. Hence the image of the nation owing its restoration to the sacrifice of fallen soldiers.
In A Normal Country:
The mayor received a letter from a Shinto priest in which the priest pointed out that it was un-Japanese to demand any more moral responsibility from the emperor than he had already taken. Had the emperor not demonstrated his deep sorrow every year, on the anniversary of Japan s surrender? Besides, he wrote, it was wrong to have spoken about the emperor in such a manner, even as the entire nation was deeply worried about his health. Then he came to the main point: It is a common error among Christians and people with Western inclinations, including so-called intellectuals, to fail to grasp that Western societies and Japanese society are based on fundamentally different religious concepts . . . Forgetting this premise, they attempt to place a Western structure on a Japanese foundation. I think this kind of mistake explains the demand for the emperor to bear full responsibility.
In Two Normal Towns:
The bust of the man caught my attention, but not because it was in any way unusual; such busts of prominent local figures can be seen everywhere in Japan. This one, however, was particularly grandiose. Smiling across the yard, with a look of deep satisfaction over his many achievements, was Hatazawa Kyoichi. His various functions and titles were inscribed below his bust. He had been an important provincial bureaucrat, a pillar of the sumo wrestling establishment, a member of various Olympic committees, and the recipient of some of the highest honors in Japan. The song engraved on the smooth stone was composed in praise of his rich life. There was just one small gap in Hatazawa s life story as related on his monument: the years from 1941 to 1945 were missing. Yet he had not been idle then, for he was the man in charge of labor at the Hanaoka mines.
In Clearing Up the Ruins:
But the question in American minds was understandable: could one trust a nation whose official spokesmen still refused to admit that their country had been responsible for starting a war? In these Japanese evasions there was something of the petulant child, stamping its foot, shouting that it had done nothing wrong, because everybody did it.
Japan seems at times not so much a nation of twelve-year-olds, to repeat General MacArthur s phrase, as a nation of people longing to be twelve-year-olds, or even younger, to be at that golden age when everything was secure and responsibility and conformity were not yet required.
For General MacArthur was right: in 1945, the Japanese people were political children. Until then, they had been forced into a position of complete submission to a state run by authoritarian bureaucrats and military men, and to a religious cult whose high priest was also formally chief of the armed forces and supreme monarch of the empire.
I saw Jew S ss that same year, at a screening for students of the film academy in Berlin. This showing, too, was followed by a discussion. The students, mostly from western Germany, but some from the east, were in their early twenties. They were dressed in the international uniform of jeans, anoraks, and work shirts. The professor was a man in his forties, a 68er named Karsten Witte. He began the discussion by saying that he wanted the students to concentrate on the aesthetics of the film more than the story. To describe the propaganda, he said, would simply be banal: We all know the what, so let s talk about the how. I thought of my fellow students at the film school in Tokyo more than fifteen years before. How many of them knew the what of the Japanese war in Asia.

11 January 2016

Ben Hutchings: Debian LTS work, December 2015

In December I carried over 15 hours from October/November and was assigned another 15 hours of work by Freexian's Debian LTS initiative. I worked a total of 20 hours despite the holidays. I uploaded a security and bug fix update to linux-2.6 early in December, and sent DLA-360-1. I also backported several more security fixes, released in the new year. I sent several of the fixes to Willy Tarreau for inclusion in Linux 2.6.32-longterm. I prepared an update to sudo to fix CVE-2015-5602. This turned out not to have been properly fixed upstream, so I finished the job and am now in the process of backporting and uploading fixes for all suites. I reviewed the packages affected by CVE-2015-8614 and the upstream fix in claws-mail, and found that that was also incomplete. This resulted in another CVE ID being issued. I had another week in the front desk role, over the new year, and triaged about 20 new issues. About half of them affected packages supported in squeeze-lts. Updated: I also found a bug in the contact-maintainers script used by the LTS front desk. It used apt-cache show to find out the maintainers of a source package, which may result in outdated information particularly if you configure APT to fetch squeeze sources in order to work on LTS! I modified the script to grab maintainer information out of the RDF description provided by packages.qa.debian.org (not yet implemented on tracker.debian.org). I feel there ought to be an easier way to do this, but at least I learned something about RDF.

11 December 2015

Ben Hutchings: Debian LTS work, November 2015

I've now been working on Debian LTS for a full year, so I'm going to stop counting months. In November, I carried over 5 hours from October and was assigned another 15 hours of work by Freexian's Debian LTS initiative. However, I spent much of the month on sick leave, so I only worked 5 billable hours on Debian LTS plus some unbilled time while on leave. I had another week in the front desk role, and triaged about 20 new issues. Less than half actually affected packages supported in squeeze-lts, and only about 5 were important. CVE-2015-5309 in putty had a patch that was fairly easy to backport, so I did that, uploaded and sent DLA 347-1. I backported several security fixes to linux-2.6 and sent some of those we had already released to Willy Tarreau for inclusion in Linux 2.6.32-longterm. At the end of the month, I reviewed Linux 2.6.32.69-rc1 and found a couple of bugs, leading to an -rc2. I applied that to the linux-2.6 packaging branch for squeeze-lts and spent a little time testing it, thankfully not hitting any regressions.

9 November 2015

Ben Hutchings: Debian LTS work, October 2015

For my 11th month working on Debian LTS, I carried over 5.5 hours from September and was assigned another 13.5 hours of work by Freexian's Debian LTS initiative. I worked 14 of a possible 19 hours. As I mentioned in the report for September, I uploaded binutils and issued DLA-324-1 early in October. I fixed a few security issues in the kernel, uploaded and issued DLA-325-1. I had some email discussions about long-term support of the Linux kernel with Willy Tarreau (Linux 2.6.32 maintainer) and Greg Kroah-Hartman (overall stable maintainer). Greg normally selects one version per year to maintain as a 'longterm' branch for 2 years, after which he may hand it over to another maintainer. A few upstream versions have received long-term support entirely from another developer. We were in agreement that it's desirable to have fewer of these branches with more developers and distributions contributing to each, but didn't come to a conclusion about how to coordinate this. The topic came up again at the Kernel Summit, and Greg then agreed to select the first kernel version of each calendar year, starting with Linux 4.4 (expected in January). This doesn't fit well with Debian's current release schedule, but I mean to discuss with the release team whether the freeze date can be set to allow inclusion of 2017's LTS kernel. I spent another week in the 'front desk' role, where I triaged new security issues for squeeze. There was a mixture of serious and trivial, old and new (not affecting squeeze) issues. The many ntp issues announced in October were fixed in unstable by a new upstream release. I spent a long time digging out the specific commits that fixed them and comparing with the older version in squeeze. Several of the issues had been introduced in ntp 4.2.7 or 4.2.8 and therefore didn't affect squeeze (or the newer stable releases). Of the fixes that were needed, most applied with minimal changes. Having prepared an update, I asked the ntp maintainer, Kurt Roeckx, to review my work. (The package has a limited test suite and none of the fixes added new tests.) Following this review he added a few more patches, uploaded and issued DLA-335-1. MySQL 5.1, as shipped in squeeze, no longer receives security support from upstream, and the security fixes they do issue are mixed with other changes that make it impractical to backport them. The LTS team is planning to backport the mysql-5.5 package to squeeze while avoiding conflicts with the binaries built from mysql-5.1. Santiago Ruano Rinc n has prepared a backport and I spent some time reviewing this, but haven't yet sent my review comments.

9 October 2015

Miriam Ruiz: Thick Skin (within Free/Open Source communities)

The definition of thick-skinned in different dictionaries ranges from not easily offended to largely unaffected by the needs and feelings of other people; insensitive , going through able to ignore personal criticism , ability to withstand criticism and show no signs of any criticism you may receive getting to you , an insensitive nature or impervious to criticism . It essentially describes an emotionally detached attitude regarding one s social environment, the capacity or ignoring or minimizing the effects of others criticism and the priorization of the protection of one s current state over the capacity of empathizing and taking into account what others may say that don t conform to one s current way of thinking. It is essentially setting up barriers against whatever others may do that might provoke any kind of crisis or change in you. There are a few underlying assumptions in the use of this term as a something good to have, when it comes to interactions with your own community: In the first place, it assumes that your own community is essentially hostile to you, and you will have to be constantly in guard against them. It assumes that it is better to set up barriers against the influence of others within your own comunity, because in fact your own peers are out there essentially to hurt you. Or, at least, they do not care a damn about you. In second place, it assumes that changes are wrong, that personal evolution is wrong, and that the more insensitive you are to your peer s opinions, the best, because they really have nothing to contribute to help you grow as a person. I m smart; you re dumb. I m big; you re little. I m right; you re wrong; And there s nothing you can do about it. (from the film Matilda). Matilda s dad is in fact the first reference that comes to my mind when we re talking about really thick skin. Scene of the film Matilda: I'm smart, You're dumb. I'm big, You're little. I'm right, You're wrong When the main recommendation when the level of aggressiveness within a community is that someone has to make their skin thicker, they are assuming that a bullying environment will help the results. This is nothing new. It s the same theoretical base that you can see in hazing and in other activities involving harassment, abuse or humiliation in college, when initiating a person into some groups. It s supposed to build character, to make someone closer to the alpha male stereotype and, in essence, make us better men (yes, I am using the word men on purpose, because insensitiveness is not usually seen as a positive trait in females). The assumption is that a community with a hard environment and individuals prepared for the war is more effective than a more civilized one. Luckily, that s not the point of view of most members of the Debian Community, and many other Free/Open Source projects. The Code of Conduct is very explicit when it says that a community in which people feel threatened is not a healthy community , and that is good. The Debian Project welcomes and encourages participation by everyone (Diversity Statement), including those with a thin skin, and I m happy about that. There are still a lot of things to improve, of course, but I have the feeling that -despite the occasional complains that having to be respectful to others take the fun away- we re moving in the right direction. The best tip I can give you on thickening your skin = don t. That is, don t thicken your skin. Having a thin skin means you re letting the world in, you re letting what s out there affect what s in you. It means you re connected. You re open. You re considerate and you ll consider it whatever it might be. Having a thin skin may be dangerous, sure, because you might take in so much that you pop, like that blueberry girl from Willy Wonka. But life is dangerous. A thick skin protects you from everything, but it also protects you from everything from the gentle touches of life, from the subtle emotions of others, the deep connections, the meaningful interactions. (Top Ten Tips on how to Thicken your Skin).

7 August 2015

Ben Hutchings: Debian LTS work, July 2015

This was my eighth month working on Debian LTS. I was assigned 14.75 hours of work by Freexian's Debian LTS initiative. linux-2.6 I didn't upload any new version of the kernel this month, but I did send all the recent security fixes to Willy Tarreau who maintains the 2.6.32.y branch at kernel.org. I also spent more time working on a fix for bug #770492 aka CVE-2015-1350, which is not yet fixed upstream. I now have a candidate patch for 2.6.32.y/squeeze, and automated tests covering many of the affected filesystems. Front desk The LTS 'front desk' role is now assigned on a rota, and I was my first turn in the third week of July. I investigated which new CVEs affected LTS-supported packages in squeeze, recorded this in the secure-testing repository, and mailed the package maintainers to give them a chance to handle the updates. groovy Groovy had a single issue (CVE-2015-3253) with a simple fix that I could apply to the version in squeeze. Unfortunately the previous version in squeeze had not been properly updated during the squeeze release cycle and could no longer be built from source. I eventually worked out what the build-dependencies should be, uploaded the fix and issued DLA-274-1. ruby1.9.1 Ruby 1.9.1 also had a single issue (CVE-2014-6438), though the fixes were more complicated and hard to find. (The original bug report for this is still not public in the upstream bug tracker.) I also had to find an earlier upstream change that they depended on. As I've mentioned before, Ruby has an extensive test suite so I could be quite confident in my backported changes. I uploaded and issued DLA-275-1. libidn The GNU library for Internationalized Domain Names, libidn, required applications to pass only valid UTF-8 strings as domain names. The Jabber instant messaging server turned out not to be validating untrusted domain names, leading to a security issue there (CVE-2015-2059). As there are likely to be other applications with similar bugs, this was resolved by adding UTF-8 validation to libidn. The fix for this involved importing additional functions from the GNU portability library, gnulib, and there my difficulties began. Confusingly, libidn has two separate sets of functions imported from gnulib, and due to interdependencies it turned out that I would have to update both of these wholesale rather than just importing the new functions that were wanted. This resulted in a 35,000 line patch. Following that I needed to autoreconf the package (and debug that process when it failed), ending up with another 26,000 line patch. Finally, it turned out that the new gnulib code needed gperf to build a header file for Solaris (when building for Linux? huh?). I ended up adding that with another patch instead. libidn has a decent test suite, so I could at least be confident in the result of my changes. I uploaded and issued DLA-277-1. Dear upstream developers, please use sane libraries instead of gnulib.

5 January 2015

Ben Hutchings: Debian LTS work, December 2014

This was my first month working on Debian LTS. My first project at Codethink was winding down, so Freexian's Debian LTS initiative was able to hire me via Codethink. I spent all of the assigned 11.5 hours working on an update to the kernel package (linux-2.6, version 2.6.32-48squeeze9). We had stopped following the upstream stable branch maintained by Willy Tarreau after 2.6.32.60 (released October 2012). Since then, we have only applied specific security fixes and other critical fixes. Rapha l Hertzog and Holger Levsen started to rebase our package on 2.6.32.64 (released November 2014), bringing in a few security fixes we didn't yet have and a larger number of fixes for functional and performance issues. I spent most of my time reviewing the several hundred changes from the upstream stable branch. I found a number of mistakes that would have caused regressions. Those should all be fixed in the update to linux-2.6, though I did not have nearly enough time for a thorough regression test. I sent my fixes to Willy for inclusion in 2.6.32.65. I also reviewed and applied fixes for several security flaws in the kernel entry and exit paths. Andy Lutomirski identified and fixed a number of problems upstream, the most serious of which was CVE-2014-9322 (though this is not listed in the changelog because the details weren't yet public). Willy found and backported the upstream fixes for inclusion in 2.6.32.65. I checked that these make sense (so far as I understand this code) and verified that Andy's test cases now have the expected results when run on the new kernel version. Updated: Added references to Codethink and Freexian.

24 February 2014

Vincent Bernat: Coping with the TCP TIME-WAIT state on busy Linux servers

TL;DR: Do not enable net.ipv4.tcp_tw_recycle. The Linux kernel documentation is not very helpful about what net.ipv4.tcp_tw_recycle does:
Enable fast recycling TIME-WAIT sockets. Default value is 0. It should not be changed without advice/request of technical experts.
Its sibling, net.ipv4.tcp_tw_reuse is a little bit more documented but the language is about the same:
Allow to reuse TIME-WAIT sockets for new connections when it is safe from protocol viewpoint. Default value is 0. It should not be changed without advice/request of technical experts.
The mere result of this lack of documentation is that we find numerous tuning guides advising to set both these settings to 1 to reduce the number of entries in the TIME-WAIT state. However, as stated by tcp(7) manual page, the net.ipv4.tcp_tw_recycle option is quite problematic for public-facing servers as it won t handle connections from two different computers behind the same NAT device, which is a problem hard to detect and waiting to bite you:
Enable fast recycling of TIME-WAIT sockets. Enabling this option is not recommended since this causes problems when working with NAT (Network Address Translation).
I will provide here a more detailed explanation in the hope to teach people who are wrong on the Internet. xkcd illustration As a sidenote, despite the use of ipv4 in its name, the net.ipv4.tcp_tw_recycle control also applies to IPv6. Also, keep in mind we are looking at the TCP stack of Linux. This is completely unrelated to Netfilter connection tracking which may be tweaked in other ways1.

About TIME-WAIT state Let s rewind a bit and have a close look at this TIME-WAIT state. What is it? See the TCP state diagram below2: TCP state diagram Only the end closing the connection first will reach the TIME-WAIT state. The other end will follow a path which usually permits to quickly get rid of the connection. You can have a look at the current state of connections with ss -tan:
$ ss -tan   head -5
LISTEN     0  511             *:80              *:*     
SYN-RECV   0  0     192.0.2.145:80    203.0.113.5:35449
SYN-RECV   0  0     192.0.2.145:80   203.0.113.27:53599
ESTAB      0  0     192.0.2.145:80   203.0.113.27:33605
TIME-WAIT  0  0     192.0.2.145:80   203.0.113.47:50685

Purpose There are two purposes for the TIME-WAIT state:
  • The most known one is to prevent delayed segments from one connection being accepted by a later connection relying on the same quadruplet (source address, source port, destination address, destination port). The sequence number also needs to be in a certain range to be accepted. This narrows a bit the problem but it still exists, especially on fast connections with large receive windows. RFC 1337 explains in details what happens when the TIME-WAIT state is deficient3. Here is an example of what could be avoided if the TIME-WAIT state wasn t shortened:
Duplicate segments accepted in another connection
  • The other purpose is to ensure the remote end has closed the connection. When the last ACK is lost, the remote end stays in the LAST-ACK state4. Without the TIME-WAIT state, a connection could be reopened while the remote end still thinks the previous connection is valid. When it receives a SYN segment (and the sequence number matches), it will answer with a RST as it is not expecting such a segment. The new connection will be aborted with an error:
Last ACK lost RFC 793 requires the TIME-WAIT state to last twice the time of the MSL. On Linux, this duration is not tunable and is defined in include/net/tcp.h as one minute:
#define TCP_TIMEWAIT_LEN (60*HZ) /* how long to wait to destroy TIME-WAIT
                                  * state, about 60 seconds     */
There have been propositions to turn this into a tunable value but it has been refused on the ground the TIME-WAIT state is a good thing.

Problems Now, let s see why this state can be annoying on a server handling a lot of connections. There are three aspects of the problem:
  • the slot taken in the connection table preventing new connections of the same kind,
  • the memory occupied by the socket structure in the kernel, and
  • the additional CPU usage.
The result of ss -tan state time-wait wc -l is not a problem per se!

Connection table slot A connection in the TIME-WAIT state is kept for one minute in the connection table. This means, another connection with the same quadruplet (source address, source port, destination address, destination port) cannot exist. For a web server, the destination address and the destination port are likely to be constant. If your web server is behind a L7 load-balancer, the source address will also be constant. On Linux, the client port is by default allocated in a port range of about 30,000 ports (this can be changed by tuning net.ipv4.ip_local_port_range). This means that only 30,000 connections can be established between the web server and the load-balancer every minute, so about 500 connections per second. If the TIME-WAIT sockets are on the client side, such a situation is easy to detect. The call to connect() will return EADDRNOTAVAIL and the application will log some error message about that. On the server side, this is more complex as there is no log and no counter to rely on. In doubt, you should just try to come with something sensible to list the number of used quadruplets:
$ ss -tan 'sport = :80'   awk ' print $(NF)" "$(NF-1) '   \
>     sed 's/:[^ ]*//g'   sort   uniq -c
    696 10.24.2.30 10.33.1.64
   1881 10.24.2.30 10.33.1.65
   5314 10.24.2.30 10.33.1.66
   5293 10.24.2.30 10.33.1.67
   3387 10.24.2.30 10.33.1.68
   2663 10.24.2.30 10.33.1.69
   1129 10.24.2.30 10.33.1.70
  10536 10.24.2.30 10.33.1.73
The solution is more quadruplets5. This can be done in several ways (in the order of difficulty to setup):
  • use more client ports by setting net.ipv4.ip_local_port_range to a wider range,
  • use more server ports by asking the web server to listen to several additional ports (81, 82, 83, ),
  • use more client IP by configuring additional IP on the load balancer and use them in a round-robin fashion,
  • use more server IP by configuring additional IP on the web server6.
Of course, a last solution is to tweak net.ipv4.tcp_tw_reuse and net.ipv4.tcp_tw_recycle. Don t do that yet, we will cover those settings later.

Memory With many connections to handle, leaving a socket open for one additional minute may cost your server some memory. For example, if you want to handle about 10,000 new connections per second, you will have about 600,000 sockets in the TIME-WAIT state. How much memory does it represent? Not that much! First, from the application point of view, a TIME-WAIT socket does not consume any memory: the socket has been closed. In the kernel, a TIME-WAIT socket is present in three structures (for three different purposes):
  1. A hash table of connections, named the TCP established hash table (despite containing connections in other states) is used to locate an existing connection, for example when receiving a new segment. Each bucket of this hash table contains both a list of connections in the TIME-WAIT state and a list of regular active connections. The size of the hash table depends on the system memory and is printed at boot:
    $ dmesg   grep "TCP established hash table"
    [    0.169348] TCP established hash table entries: 65536 (order: 8, 1048576 bytes)
    
    It is possible to override it by specifying the number of entries on the kernel command line with the thash_entries parameter. Each element of the list of connections in the TIME-WAIT state is a struct tcp_timewait_sock, while the type for other states is struct tcp_sock7:
    struct tcp_timewait_sock  
        struct inet_timewait_sock tw_sk;
        u32    tw_rcv_nxt;
        u32    tw_snd_nxt;
        u32    tw_rcv_wnd;
        u32    tw_ts_offset;
        u32    tw_ts_recent;
        long   tw_ts_recent_stamp;
     ;
    struct inet_timewait_sock  
        struct sock_common  __tw_common;
        int                     tw_timeout;
        volatile unsigned char  tw_substate;
        unsigned char           tw_rcv_wscale;
        __be16 tw_sport;
        unsigned int tw_ipv6only     : 1,
                     tw_transparent  : 1,
                     tw_pad          : 6,
                     tw_tos          : 8,
                     tw_ipv6_offset  : 16;
        unsigned long            tw_ttd;
        struct inet_bind_bucket *tw_tb;
        struct hlist_node        tw_death_node;
     ;
    
  2. A set of lists of connections, called the death row , is used to expire the connections in the TIME-WAIT state. They are ordered by how much time left before expiration. It uses the same memory space as for the entries in the hash table of connections. This is the struct hlist_node tw_death_node member of struct inet_timewait_sock.
  3. A hash table of bound ports, holding the locally bound ports and the associated parameters, is used to determine if it is safe to listen to a given port or to find a free port in the case of dynamic bind. The size of this hash table is the same as the size of the hash table of connections:
    $ dmesg   grep "TCP bind hash table"
    [    0.169962] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
    
    Each element is a struct inet_bind_socket. There is one element for each locally bound port. A TIME-WAIT connection to a web server is locally bound to the port 80 and shares the same entry as its sibling TIME-WAIT connections. On the other hand, a connection to a remote service is locally bound to some random port and does not share its entry.
So, we are only concerned by the space occupied by struct tcp_timewait_sock and struct inet_bind_socket. There is one struct tcp_timewait_sock for each connection in the TIME-WAIT state, inbound or outbound. There is one dedicated struct inet_bind_socket for each outbound connection and none for an inbound connection. A struct tcp_timewait_sock is only 168 bytes while a struct inet_bind_socket is 48 bytes:
$ sudo apt-get install linux-image-$(uname -r)-dbg
[...]
$ gdb /usr/lib/debug/boot/vmlinux-$(uname -r)
(gdb) print sizeof(struct tcp_timewait_sock)
 $1 = 168
(gdb) print sizeof(struct tcp_sock)
 $2 = 1776
(gdb) print sizeof(struct inet_bind_bucket)
 $3 = 48
So, if you have about 40,000 inbound connections in the TIME-WAIT state, it should eat less than 10MB of memory. If you have about 40,000 outbound connections in the TIME-WAIT state, you need to account for 2.5MB of additional memory. Let s check that by looking at the output of slabtop. Here is the result on a server with about 50,000 connections in the TIME-WAIT state, 45,000 of which are outbound connections:
$ sudo slabtop -o   grep -E '(^  OBJS tw_sock_TCP tcp_bind_bucket)'
  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
 50955  49725  97%    0.25K   3397       15     13588K tw_sock_TCP            
 44840  36556  81%    0.06K    760       59      3040K tcp_bind_bucket
There is nothing to change here: the memory used by TIME-WAIT connections is really small. If your server need to handle thousands of new connections per second, you need far more memory to be able to efficiently push data to clients. The overhead of TIME-WAIT connections is negligible.

CPU On the CPU side, searching for a free local port can be a bit expensive. The work is done by the inet_csk_get_port() function which uses a lock and iterate on locally bound ports until a free port is found. A large number of entries in this hash table is usually not a problem if you have a lot of outbound connections in the TIME-WAIT state (like ephemeral connections to a memcached server): the connections usually share the same profile, the function will quickly find a free port as it iterates on them sequentially.

Other solutions If you still think you have a problem with TIME-WAIT connections after reading the previous section, there are three additional solutions to solve them:
  • disable socket lingering,
  • net.ipv4.tcp_tw_reuse, and
  • net.ipv4.tcp_tw_recycle.

Socket lingering When close() is called, any remaining data in the kernel buffers will be sent in the background and the socket will eventually transition to the TIME-WAIT state. The application can continue to work immediatly and assume that all data will eventually be safely delivered. However, an application can choose to disable this behaviour, known as socket lingering. There are two flavors:
  1. In the first one, any remaining data will be discarded and instead of closing the connection with the normal four-packet connection termination sequence, the connection will be closed with a RST (and therefore, the peer will detect an error) and will be immediatly destroyed. No TIME-WAIT state in this case.
  2. With the second flavor, if there is any data still remaining in the socket send buffer, the process will sleep when calling close() until either all the data is sent and acknowledged by the peer or the configured linger timer expires. It is possible for a process to not sleep by setting the socket as non-blocking. In this case, the same process happens in the background. It permits the remaining data to be sent during a configured timeout but if the data is succesfully sent, the normal close sequence is run and you get a TIME-WAIT state. And on the other case, you ll get the connection close with a RST and the remaining data is discarded.
In both cases, disabling socket lingering is not a one-size-fits-all solution. It may be used by some applications like HAProxy or Nginx when it is safe to use from the upper protocol point of view. There are good reasons to not disable it unconditionnaly.

net.ipv4.tcp_tw_reuse The TIME-WAIT state prevents delayed segments to be accepted in an unrelated connection. However, on certain conditions, it is possible to assume a new connection s segment cannot be misinterpreted with an old connection s segment. RFC 1323 presents a set of TCP extensions to improve performance over high-bandwidth paths. Among other things, it defines a new TCP option carrying two four-byte timestamp fields. The first one is the current value of the timestamp clock of the TCP sending the option while the second one is the most recent timestamp received from the remote host. By enabling net.ipv4.tcp_tw_reuse, Linux will reuse an existing connection in the TIME-WAIT state for a new outgoing connection if the new timestamp is strictly bigger than the most recent timestamp recorded for the previous connection: an outgoing connection in the TIME-WAIT state can be reused after just one second. How is it safe? The first purpose of the TIME-WAIT state was to avoid duplicate segments to be accepted in an unrelated connection. Thanks to the use of timestamps, such a duplicate segments will come with an outdated timestamp and therefore be discarded. The second purpose was to ensure the remote end is not in the LAST-ACK state because of the lost of the last ACK. The remote end will retransmit the FIN segment until:
  1. it gives up (and tear down the connection), or
  2. it receives the ACK it is waiting (and tear down the connection), or
  3. it receives a RST (and tear down the connection).
If the FIN segments are received in a timely manner, the local end socket will still be in the TIME-WAIT state and the expected ACK segments will be sent. Once a new connection replaces the TIME-WAIT entry, the SYN segment of the new connection is ignored (thanks to the timestamps) and won t be answered by a RST but only by a retransmission of the FIN segment. The FIN segment will then be answered with a RST (because the local connection is in the SYN-SENT state) which will allow the transition out of the LAST-ACK state. The initial SYN segment will eventually be resent (after one second) because there was no answer and the connection will be established without apparent error, except a slight delay: Last ACK lost and timewait reuse It should be noted that when a connection is reused, the TWRecycled counter is increased (despite its name).

net.ipv4.tcp_tw_recycle This mechanism also relies on the timestamp option but affects both incoming and outgoing connections which is handy when the server usually closes the connection first8. The TIME-WAIT state is scheduled to expire sooner: it will be removed after the retransmission timeout (RTO) interval which is computed from the RTT and its variance. You can spot the appropriate values for a living connection with the ss command:
$ ss --info  sport = :2112 dport = :4057
State      Recv-Q Send-Q    Local Address:Port        Peer Address:Port   
ESTAB      0      1831936   10.47.0.113:2112          10.65.1.42:4057    
         cubic wscale:7,7 rto:564 rtt:352.5/4 ato:40 cwnd:386 ssthresh:200 send 4.5Mbps rcv_space:5792
To keep the same guarantees the TIME-WAIT state was providing, while reducing the expiration timer, when a connection enters the TIME-WAIT state, the latest timestamp is remembered in a dedicated structure containing various metrics for previous known destinations. Then, Linux will drop any segment from the remote host whose timestamp is not strictly bigger than the latest recorded timestamp, unless the TIME-WAIT state would have expired:
if (tmp_opt.saw_tstamp &&
    tcp_death_row.sysctl_tw_recycle &&
    (dst = inet_csk_route_req(sk, &fl4, req, want_cookie)) != NULL &&
    fl4.daddr == saddr &&
    (peer = rt_get_peer((struct rtable *)dst, fl4.daddr)) != NULL)  
        inet_peer_refcheck(peer);
        if ((u32)get_seconds() - peer->tcp_ts_stamp < TCP_PAWS_MSL &&
            (s32)(peer->tcp_ts - req->ts_recent) >
                                        TCP_PAWS_WINDOW)  
                NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED);
                goto drop_and_release;
         
 
When the remote host is in fact a NAT device, the condition on timestamps will forbid allof the hosts except one behind the NAT device to connect during one minute because they do not share the same timestamp clock. In doubt, this is far better to disable this option since it leads to difficult to detect and difficult to diagnose problems. The LAST-ACK state is handled in the exact same way as for net.ipv4.tcp_tw_recycle.

Summary The universal solution is to increase the number of possible quadruplets by using, for example, more server ports. This will allow you to not exhaust the possible connections with TIME-WAIT entries. On the server side, do not enable net.ipv4.tcp_tw_recycle unless you are pretty sure you will never have NAT devices in the mix. Enabling net.ipv4.tcp_tw_reuse is useless for incoming connections. On the client side, enabling net.ipv4.tcp_tw_reuse is another almost-safe solution. Enabling net.ipv4.tcp_tw_recycle in addition to net.ipv4.tcp_tw_reuse is mostly useless. And a final quote by W. Richard Stevens, in Unix Network Programming:
The TIME_WAIT state is our friend and is there to help us (i.e., to let old duplicate segments expire in the network). Instead of trying to avoid the state, we should understand it.

  1. Notably, fiddling with net.netfilter.nf_conntrack_tcp_timeout_time_wait won t change anything on how the TCP stack will handle the TIME-WAIT state.
  2. This diagram is licensed under the LaTeX Project Public License 1.3. The original file is available on this page.
  3. The first work-around proposed in RFC 1337 is to ignore RST segments in the TIME-WAIT state. This behaviour is controlled by net.ipv4.rfc1337 which is not enabled by default on Linux because this is not a complete solution to the problem described in the RFC.
  4. While in the LAST-ACK state, a connection will retransmit the last FIN segment until it gets the expected ACK segment. Therfore, it is unlikely we stay long in this state.
  5. On the client side, older kernels also have to find a free local tuple (source address and source port) for each outgoing connection. Increasing the number of server ports or IP won t help in this case. Linux 3.2 is recent enough to be able to share the same local tuple for different destinations. Thanks to Willy Tarreau for his insight on this aspect.
  6. This last solution may seem a bit dumb since you could just use more ports but some servers are not able to be configured this way. The before last solution can also be quite cumbersome to setup, depending on the load-balancing software, but uses less IP than the last solution.
  7. The use of a dedicated memory structure for sockets in the TIME-WAIT is here since Linux 2.6.14. The struct sock_common structure is a bit more verbose and I won t copy it here.
  8. When the server closes the connection first, it gets the TIME-WAIT state while the client will consider the corresponding quadruplet free and hence may reuse it for a new connection.

31 December 2013

Paul Tagliamonte: Hy 0.9.12 released

Good morning all my hungover friends. New Hy release - sounds like the perfect thing to do while you re waiting for your headaches to go away. Here s a short-list of the changes (from NEWS) - enjoy!
Changes from Hy 0.9.11
   tl;dr:
    0.9.12 comes with some massive changes,
    We finally took the time to implement gensym, as well as a few
    other bits that help macro writing. Check the changelog for
    what exactly was added. 
    
    The biggest feature, Reader Macros, landed later
    in the cycle, but were big enough to warrent a release on it's
    own. A huge thanks goes to Foxboron for implementing them
    and a massive hug goes out to olasd for providing ongoing
    reviews during the development.
    
    Welcome to the new Hy contributors, Henrique Carvalho Alves,
    Kevin Zita and Kenan B l kba . Thanks for your work so far,
    folks!
    
    Hope y'all enjoy the finest that 2013 has to offer,
      - Hy Society
    * Special thanks goes to Willyfrog, Foxboron and theanalyst for writing
      0.9.12's NEWS. Thanks, y'all! (PT)
    [ Language Changes ]
    * Translate foo? -> is_foo, for better Python interop. (PT)
    * Reader Macros!
    * Operators + and * now can work without arguments
    * Define kwapply as a macro
    * Added apply as a function
    * Instant symbol generation with gensym
    * Allow macros to return None
    * Add a method for casting into byte string or unicode depending on python version
    * flatten function added to language
    * Add a method for casting into byte string or unicode depending on python version
    * Added type coercing to the right integer for the platform
    [ Misc. Fixes ]
    * Added information about core team members
    * Documentation fixed and extended
    * Add astor to install_requires to fix hy --spy failing on hy 0.9.11.
    * Convert stdout and stderr to UTF-8 properly in the run_cmd helper.
    * Update requirements.txt and setup.py to use rply upstream.
    * tryhy link added in documentation and README
    * Command line options documented
    * Adding support for coverage tests at coveralls.io
    * Added info about tox, so people can use it prior to a PR
    * Added the start of hacking rules
    * Halting Problem removed from example as it was nonfree
    * Fixed PyPI is now behind a CDN. The --use-mirrors option is deprecated.
    * Badges for pypi version and downloads.
    [ Syntax Fixes ]
    * get allows multiple arguments
    [ Bug Fixes ]
    *  OSX: Fixes for readline Repl problem which caused HyREPL not allowing 'b'
    * Fix REPL completions on OSX
    *  Make HyObject.replace more resilient to prevent compiler breakage.
    [ Contrib changes ]
    * Anaphoric macros added to contrib
    * Modified eg/twisted to follow the newer hy syntax
    * Added (experimental) profile module

22 January 2009

David Welton: Github Part III

Since this seems to be fairly constructive, I'll keep going. Those not interested in Github, or the dynamics of open source code creation can safely tune out. Chris, the Github dude, has some more to say: http://ozmm.org/posts/forking_continued.html
I mentioned the Network Graph and Fork Queue but David mentioned neither. I think he doesn t know what they are, probably because I didn t explain what they are :)
I had had a look at them, and they're handy tools (I would never accuse Github of not doing good work, that's not the issue at all), but I would respectfully submit that perhaps Chris is looking at things from a very Github-centric perspective, in which it's second nature to go look at those. They aren't obvious, and certainly don't jump out at the user to say "hey, maybe this code your looking out isn't the latest and greatest!". For someone just cruising around, perhaps it should be more evident who's the 'top dog'? Chris then goes on with a very helpful and illustrative demo of how things work at Github, which is good stuff. However, at the end, he says something that I don't agree with:
It may seem strange, and perhaps even like a lot of work. Why should I have to check to see which is the most current? In the old model, there s always a canonical repository.
That's precisely the problem. It does seem like a lot of work, especially when your search space is not limited to Github, but may include other places like Sourceforge, RubyForge, Google Code, project specific sites, and so on.
In the old model, actionwebservice wouldn t have made it past 1.2.6. Welcome to distributed version control.
Plenty of 'old style' projects have survived beyond their founders' interest in the project. What happens is that you ask for permission to work on the project, and either :
  1. It's given to you, in which case you can keep working on the canonical code. For instance, someone could have asked DHH to work on the RubyForge version of actionwebservice. Did they? Did he say no? At the very least, he could have been asked to point the RubyForge actionwebservice page at some other site with a more current version of the code.
  2. You have to fork the project, and in that case, sure, you might as well have been using the Github model.
I've often found that people are happy to let you contribute to their projects, though, and part of my original point with all of this is that if people just go spitting out forks willy-nilly, it creates a "paradox of choice" type problem, and perhaps takes something away from the community aspect of open source projects. As people are fond of saying at the Apache Software Foundation, it's about the people, not necessarily about the code. I'm not saying that forks are always bad and that everything should be centrally done, but there's a balance to be struck between people just working on their own, and some sort of onerous, bureaucratic Central Project Authority. It's nice that people who want to improve the code make themselves heard on a mailing list/forum/whatever, and thus cover the "people" part of merging in new code and new ideas. More often than not, help and contributions are more than welcome. Design decisions and conversations recorded on mailing lists are available for people to peruse in the future when they have questions. Just to repeat something that bears repeating: I am not claiming that Github will lead to social disintegration of open source projects or anything drastic like that. However, I'm a bit wary of certain patterns I've seen. It's certainly possible that I'm wrong - as I mentioned, one error I might be making is that people are dumping code they simply wouldn't have shared on Github, because Github makes it so easy. I do hope my worries are unfounded, and in any case, it's a good thing that the Github guys are interested in the problem too, and will hopefully do things to alleviate it where possible.

David Welton: Github Part III

Since this seems to be fairly constructive, I'll keep going. Those not interested in Github, or the dynamics of open source code creation can safely tune out. Chris, the Github dude, has some more to say: http://ozmm.org/posts/forking_continued.html
I mentioned the Network Graph and Fork Queue but David mentioned neither. I think he doesn???t know what they are, probably because I didn???t explain what they are :)
I had had a look at them, and they're handy tools (I would never accuse Github of not doing good work, that's not the issue at all), but I would respectfully submit that perhaps Chris is looking at things from a very Github-centric perspective, in which it's second nature to go look at those. They aren't obvious, and certainly don't jump out at the user to say "hey, maybe this code your looking out isn't the latest and greatest!". For someone just cruising around, perhaps it should be more evident who's the 'top dog'? Chris then goes on with a very helpful and illustrative demo of how things work at Github, which is good stuff. However, at the end, he says something that I don't agree with:
It may seem strange, and perhaps even like a lot of work. ???Why should I have to check to see which is the most current? In the old model, there???s always a canonical repository.???
That's precisely the problem. It does seem like a lot of work, especially when your search space is not limited to Github, but may include other places like Sourceforge, RubyForge, Google Code, project specific sites, and so on.
In the old model, actionwebservice wouldn???t have made it past 1.2.6. Welcome to distributed version control.
Plenty of 'old style' projects have survived beyond their founders' interest in the project. What happens is that you ask for permission to work on the project, and either :
  1. It's given to you, in which case you can keep working on the canonical code. For instance, someone could have asked DHH to work on the RubyForge version of actionwebservice. Did they? Did he say no? At the very least, he could have been asked to point the RubyForge actionwebservice page at some other site with a more current version of the code.
  2. You have to fork the project, and in that case, sure, you might as well have been using the Github model.
I've often found that people are happy to let you contribute to their projects, though, and part of my original point with all of this is that if people just go spitting out forks willy-nilly, it creates a "paradox of choice" type problem, and perhaps takes something away from the community aspect of open source projects. As people are fond of saying at the Apache Software Foundation, it's about the people, not necessarily about the code. I'm not saying that forks are always bad and that everything should be centrally done, but there's a balance to be struck between people just working on their own, and some sort of onerous, bureaucratic Central Project Authority. It's nice that people who want to improve the code make themselves heard on a mailing list/forum/whatever, and thus cover the "people" part of merging in new code and new ideas. More often than not, help and contributions are more than welcome. Design decisions and conversations recorded on mailing lists are available for people to peruse in the future when they have questions. Just to repeat something that bears repeating: I am not claiming that Github will lead to social disintegration of open source projects or anything drastic like that. However, I'm a bit wary of certain patterns I've seen. It's certainly possible that I'm wrong - as I mentioned, one error I might be making is that people are dumping code they simply wouldn't have shared on Github, because Github makes it so easy. I do hope my worries are unfounded, and in any case, it's a good thing that the Github guys are interested in the problem too, and will hopefully do things to alleviate it where possible.

11 November 2008

Joey Hess: spelling policy

Effective immediatly -- If you're annoyed by the spelling of something I've written, I will accept a patch -- preferably generated by git-format-patch, but diff -u is also acceptable. Any other communication about spelling mistakes will be ignored, unless the mistake has ramifications that will cause undue pain and suffering to people who are not English majors. Furthermore, if the spelling "mistake" is that I spelled "-ize" as "-ise", it's not a mistake -- I prefer to use the latter form for obscure reasons, with a few exceptions. PS, I realise that these are entirely arbitrary rules forced upon you willy-nilly. Teh irony..

22 September 2008

Axel Beckert: Can't resist this meme

Just stumbled over this meme at Adrian (the meme seems to be started by madduck involuntarily), and since I’m fascinated by how people choose hostnames since my early years at university, I can’t resist to add my two cents to this meme. To be exact, I have two schemes, one for servers out there somewhere (Hetzner, xencon, etc.) and they’re all wordplays on their domain name noone.org, e.g. symlink.to.noone.org (short name “sym” :-), gateway.to.noone.org (usually an alias for one of the machines below), virtually.noone.org (always a virtual machine, initially UML, soon a Xen DomU), etc. So nothing for a quiz here. My other scheme is for all my machines at home and my mobile machines. I’ll start this list with the not so obvious hostnames, so the earlier you guess the scheme, the better you are (or the better you know me ;-). One more hint in advance: “(*)” means this attribute or fact made me choose the name for the machine and therefore can be used as hint for the scheme. :-)
azam
My first PC at all, a 386 with 25 MHz and MS-DOS. (Got named retroactively(*). Hadn’t hostnames at that time.)
ak (pronounced as letters)
Got it from my brother after he didn’t need it anymore. It initially was identical to azam, but once was upgraded to a 486. Still have the 386 board, though.
azka
My first self-bought computer, a pure SCSI system with a AMD K5-PR133 and 32 MB RAM. Initially had SuSE 4.4 and Windows 95 on. Still my last machine which had a Windows installed! :-)
m35
Same case and same speed as azka. Used it for experimenting(*) with Sid years ago.
azu
Initially also an AMD K5-PR133, later replaced by a Pentium 90 and used as DSL router.
azl
An HP Vectra 386/25N book size mini desktop I saved from the scrapyard at Y_Plentyn before his (first) move to Munich. The cutest(*) 386 I ever saw.
ayce
A 386 with 387 co-processor(*) and solded 8 MB of RAM.
ayca
A 1992 Toshiba T6400C 486 laptop bought at VCFe 5.0.
bijou
My 1996 ThinkPad 760ED, which is still working and running Debian GNU/Linux 5.0 Lenny (I started with Debian 3.0 Woody on it and always dist-upgraded it! :-)
gsa (pronounced as letters)
My long-time desktop after azka. A Pentium II with 400 MHz and 578 MB of RAM at the end. Bought used at LinuxTag 2003, it worked until end of last year when it started to suddenly switch off more and more often and now refuses to boot at all. Hasn’t been replaced yet though. I mostly use my laptops at home since then.
gsx (pronounced as letters)
An AMD K6 with 500 MHz I got from maol and which was used as Symlink test server more than once. (It was the machine initially named symlink.to.noone.org because of that.)
hy
My 32 bit Sparc, a Hamilton Hamstation.
hz (pronounced as letters)
My 64 bit Sparc, an UltraSparc 5.
tub
An HP Apollo 9000 Series 400, model 400t from 1990.
tpv (pronounced as letters, too ;-)
My Zaurus SL-5500G.
tryane
A Unisys Acquanta CP mini desktop with a passively cooled(*) 200 MHz Pemtium MMX. Used as DSL router for while, but the power supply fan was too noisy.
lna (pronounced as letters)
A 233 MHz Alpha
loadrunner
An IBM ThinkPad A31 running Sid. I use it as beside terminal.
pony
A Compaq LTE5100 laptop with a Pentium 90 running Sid.
dagonet
A Sony Vaio laptop which ran Debian GNU/kFreeBSD until it broke.
Those who know me quite good should already have guessed the scheme, even if they can’t assign all the names. For all others, here’s one name which doesn’t exactly fit into the scheme, but still is related in someway, but you need to knowledge of the theme’s subject to know the relation:
colani
A big tower from the early 90s designed by Colani.
Ok, and now the more obvious hostnames:
rosalie
A very compact Toshiba T1000LE 8086 laptop running ELKS and FreeDOS.
amisuper
Also an old Symlink test server from maol. He named it “dual”. 2x(*) Pentium I with 166 MHz. Unfortunately doesn’t boot anymore.
visa
An IBM NetVista workstation running Debian GNU/kFreeBSD. My current IRC host.
nemo
My ASUS EeePC running Debian 5.0 Lenny.
pluriel
My current WLAN router running FreeWRT.
c1
My MicroClient JrSX, an embedded 486SX compatible machine with 300 Mhz for VESA mountings.
c2
My MicroClient Jr, an embedded Pentium MMX compatible machine with 200 Mhz for VESA mountings.
c-crosser
My Lenovo ThinkPad T61 running Debian 5.0 Lenny.
c-cactus and c-metisse
The KVM based virtual(*) machines on c-crosser running Sid and Debian GNU/kFreeBSD.
jumper
My NAS(*) at home, currently a TheCus N4100. Soon to be replaced by some Mini-ITX box.
Any one who hasn’t guessed the scheme yet? For those understanding German it’s explained at the end of my old hardware page. For all others I suggest either to look at the domain name in my e-mail address (no, it’s usually not noone.org). Still not clear? Well, feel free to ask me for all the gory details or mark the following white box to see the scheme as well as the explanations for nearly all hostnames hidden in there:
All the machines are named after Citroëns. Old machines after old Citroëns, current hardware after current Citroën models or prototypes. Those names starting with “A” are 2CV derivatives since the 2CV was Citroëns “A” model. “AZ” was the 2CV, AZU and AK were 2CV vans and everything starting with AY (e.g. AYA, AYA2, AYB – but those don’t sound that nice ;-) is Dyane based, but I currently only use Méhara names (AYCA is the normal Méhari, AYCE the 4x4 version). Interestingly not everything starting with AYC is a Méhari: AYCD was the Acadiane, the Dyane van. HY and HZ are variants of Citroëns “H van” (HX, HW and H1600 as well, but they don’t sound that nice), TUB was the pre-WWII “H van” prototype and later the nickname of the “H van” in France. TPV was the name of the pre-WWII 2CV prototype and an abbreviation for Toute Petite Voiture (French for “Very Small Car”), hence the Zaurus, my smallest Linux box, got that name. Rosalie was the nickname of a rear-wheel drive pre-WWII Citroën. M35 was a Wankel engine prototype of the Ami 8 and the Ami Super was the 4 cylinder version of the Ami 8. Bijou was a 2CV based coupé build by Citroën UK in the late 50s and early 60s. Visa and LNA were 2CV predecessors which were available with 2CV engines, but were stopped before the 2CV. GSA and GSX are GS late derivatives. C1, C2, (C3) Pluriel, C-Crosser, Jumper and Nemo are current Citroën models and C-Cactus and C-Métisse are recent Citroën prototypes and show cars. The 2CV Dagonet was an aerodynamically optimised 2CVs by Jean Dagonet in the 50s. The Tryane is an aerodynamic and fuel efficient, three wheeled car by Friend Wood based on the 2CV and with a body of wood. And Colani once dressed a 2CV so that it broke several efficiency world records. The Namco Pony was a 2CV based light utility truck (similar to the Méhari, but with steel body) built in Greece under license in many variants. And Loadrunner is the name of some CX six-wheeler conversions.
Some links about the naming items: Hope you had fun. I had. ;-)

Now playing: Willi Astor — Gwand Anham Ära

13 November 2005

David Welton: Late to the party

Something I'm really curious about is why the Java folks suddenly stood up and took notice of Ruby. Is it just Rails? Did Rails arrive in the right place at the right time? It's not a rhetorical question - I don't have a lot of insight into the Java world, so I'm genuinely curious as to what sparked this interest in "beyond Java". Python's been around for a long time, and isn't that different from Ruby. Tcl with Tk runs circles around Java for cross platform GUI development. PHP, even if it's not the cleanest or most glamorous thing out there, makes whipping up small web tools easy. (I'll concede that Perl has a culture that is too far removed from a lot of the things Java tries to do). And all those have been under everyone's noses for years. So why all of a sudden is Ruby the "hot new thing" that is to be embraced, rather "defended against", as we often see in such programming language "willy waving contests". Was this something that was just waiting to happen, and all it took was something to coalesce around? Were people in that world starting to get wrist injuries from typing out so much stuff, and on the lookout for "something else"? Joking aside, I really am curious, because over the years, I've developed an interest in how these things work. I talk about it some in The Economics of Programming Languages, but that's clearly not the whole story. Java and its human resources requirements Another thing I've been idly wondering about is whether perhaps Java got pushed a lot (perhaps unconsciously) by big companies, because 1) it is a good solution for big teams 2) some aspects of it are so big and unweildy that only companies that can muster the manpower to throw at it are going to do well with it, meaning that smaller shops would either have to not compete in that market, or compete poorly. Obviously, it's not strictly necessary to use Java - more often than not, you could accomplish the same thing in less time with some other system, however, the "other pincer" is the marketing out there that Java is the "serious, corporate, enterprise" solution, and that anything "less" just won't do. This gives firms an incentive to use Java even when it might not be the best solution, and helps create a market where it's a given that the solution is some Java system, and the bigger company, being able to bring more people to bear, may have an edge. Anyway, just some idle speculation - what do you think?