Writings

February 26, 2013 · Technology · open-source

Linked in Visualization

I love visualizations, and quite enjoyed this new tool by Linked In. It did a pretty good job on the clustering, blue is IBM, green is OpenStack, red is Wesleyan, the purple at the bottom is basically 3D internet/OpenSim folks, and the orange/grey lobe is MHVLUG and other local folks in the area.

You can generate your own one here.

February 23, 2013 · Technology · software · open-source

Source Forge Open Source Again

Apparently Source Forge has gone open source again, and even is an incubated project at Apache. The source code is in git, and the new source forge looks like it's all written in Python instead of PHP.

Source Forge has had a pretty storied history. It started Open Source, all those years ago, then in the dot com collapse they stopped releasing source and instead tried to sell an onsite hosted solution, Source Forge Enterprise Edition. The Linux Technology Center was one of their few customers, providing an internal source forge for the rest of IBM. I had the "opportunity" to help debug some of that code for performance reasons, and discovered that a lot of Source Forge's slowness was due to a major lack of understanding by the development team on how database indexes work. Those fixes flowed upstream.

Later, one of the key developers from Source Forge forked GForge from the last open source release. So we had Open Source "source forge" again. Then a couple years later the GForge team pulled the same stunt as Source Forge, tried to monetize, and seal off the source code.

Then git happened, and all these CVS / SVN based hosting solutions looked really quaint. A couple years later we had github, and the center of gravity of Open Source has been migrating ever since.

Source Forge's current owner is Dice, the job search company, so the economics of keeping it Open Source are a little different. "What's your github id?" is now a standard job interview question, so I can imagine the new Source Forge team has a pretty broad brush to just make Source Forge as good as they can. I wish them luck.

February 21, 2013 · OpenStack · openstack · software

The OpenStack Gate

The OpenStack project has a really impressive continuous integration system, which is one of its core strengths as a project. Every proposed change to our gerrit review system is subjected to a battery of tests on each commit, which has grown dramatically with time, and after formal review by core contributors, we run them all again before the merge.

These tests take on the order of 1 hour to run on a commit, which would make you immediately think the most code that OpenStack could merge in a day would be 24 commits. So how did Nova itself manage to merge 94 changes since Monday (not to mention all the other projects, which adds up to ~200 in 3 days)? The magic of this is Zuul, the gatekeeper.

Zuul is a queuing system for CI jobs, written and maintained by the OpenStack infrastructure team. It does many cool things, but what I want to focus on is the gate queue. When the gate queue is empty (yes it does happen some times), the job is simple: add a new commit, run the tests, and we're off. What happens if there are already 5 jobs ahead of you in the gate? Let's take a concrete example of nova. Speculative Merge By the time a commit has gotten this far, it's already passed the test suites at least once, and has had at least 2 core contributors sign off on the change in code review. So Zuul assumes everything ahead of the change in the gate will succeed, and starts the tests immediately cherry picking this change on top everything that's ahead of it in the queue.

That means that merge time on the gate is O(1), that is merging 10 changes takes the same time as 1 change. If the queue gets too big, we do eventually run out of devstack nodes, so the ability to run tests is not strictly constant time. On the run up to grizzly-3 both the cloud providers (HP and Rackspace) which contribute these VMs provided some extra quota to the OpenStack team to help keep things moving. So we had an elastic burst of OpenStack CI onto additional OpenStack public cloud resources, which is just fun to think about. Speculation Can Fail Of course, speculation can fail. Maybe change 3 doesn't merge because something goes wrong in the tests. If that happens we then kick the change out of the queue, and then all the changes behind it have to be reset to pull change 3 out of the speculation. This is the dreaded gate reset, because when gate resets happen, all the time spent on speculative tests behind the failure is lost, and the jobs need to restart.

Speculation failures largely fall into a few core classes:

Jenkins crashes - it doesn't happen often, but Jenkins is software too, and OpenStack CI tends to drive software really hard, so we force out edge cases everywhere.

Upstream service failures - we try to isolate ourselves from upstream failures as much as possible. Our git trees pull from our gerrit, not directly from github. Our apt repository is a Rackspace local mirror, not generically upstream. And the majority of pip python packages come from our own proxy server. But if someone adds a new python dependency, or a version of one updates and we don't yet have it cached, we pass through to pypi for that pip install. On Tuesday pypi converted from HTTP to HTTPS, and didn't fully grok the load implications, which broke OpenStack CI (as well as lots of other python developers) for a few hours when pypi effectively was down from load.

Transient OpenStack bugs - OpenStack is complicated software, 7 core components interacting with each other asynchronously over REST web services. Each core component being a collection of daemons that interact with each other asynchronously. Sometimes, something goes wrong. It's a real bug, but only shows up under very specific timing and state conditions. Because OpenStack CI runs so many tests every day (OpenStack CI may be one of the largest creators of OpenStack guests in the world every day), very obscure edge and race conditions can be exposed in the system. We try to track these as recheck bugs 🔗💀, and are making them high priority to address. By definition they are hard to track down (they expose themselves on maybe 1 out of 1000 or fewer test runs), so the logs captured in OpenStack CI are the tools to get to the bottom of these. Towards an Even Better Gate In my year working on OpenStack I've found the unofficial motto of the project to be "always try to make everything better". Continuous improvement is not just left to the code, and the tests, but the infrastructure as well.

We're trying to get more urgency and eyes on the transient failures, coming up with ways to discover the patterns from the 1 in 1000 fails. After you get two or three that fail in the same way it helps triangulate the core issue. Core developers from all the projects are making these high priority items to fix.

On the upstream service failures the OpenStack infrastructure team already has proxies sitting in front of many of the services, but the pypi outage showed we probably need something even more robust to handle that upstream service outage, possibly rotating between pypi mirrors on the fall-through case, or a better proxy model. The team is already actively exploring solutions to prevent that from happening again.

As always, everyone is welcomed to come help us make everything better. Take a look at the recheck bugs 🔗💀 and help us solve them. Join us on #openstack-infra and help with Zuul. Check out what the live Zuul queue 🔗💀 looks like. All the code for this system is open source, and available under either the openstack, or openstack-infra github accounts. Patches are always welcome!

February 17, 2013 · Technology · software

Refactoring LibreOffice

The FOSDEM 2013 talks are up now, and this one of LibreOffice Refactoring really hit an interesting mark. The LibreOffice team has been aggressively rebuilding a culture of rapid change as a road to quality, bringing in a test and test automation culture, and leaving nearly no parts of the code as sacred.

It's interesting that LibreOffice seems to be doing a much better job than OpenOffice at removing technical debt. I think we're already seeing the effect of that cultural split, and I expect that in the future this is going to get far more obvious.

I can't wait to get the Android remote working for future presentations. Will be a lot of fun to drive my presentations that way.

February 17, 2013 · Technology · openstack · software

Software Engineering Talk at Vassar College

While I've been giving talks at conferences and user groups for the last decade, I leveled up a little on Friday and was an invited speaker on the Vassar College Computer Science Asprey Lecture Series. The topic was Software Engineering at Scale, using the OpenStack project as an example.

I gave the folks there a glimpse of what's behind a successful project that is able to integrate code from over 400 unique developers in 5 months time. I talked about planning, the design summits, the contribution and code review tools we use. But, as with every time I talk about OpenStack, the thing that really wows people is the testing infrastructure we've got. It was equally latched onto by the students and CIS staff in the room.

On every code submission we run style checks, unit tests (5000 of them in Nova now), and spin up a full OpenStack install and hit it with a nearly 700 test integration suite, before the first humans start looking at the code for manual review. It's an incredibly empowering system, that means developers have a high bar to submit working code that doesn't alter the behavior of the system. And it means that by the time the expert eyes do code review, the kinds of problems they are looking for are much more interesting.

Just this morning it meant I could look through a new proposed extension in gerrit and focus on some of the functional behavior, including understanding which kinds of code the test system has a harder time touching. The confidence that gives you as a reviewer that everything isn't on the verge of breaking all the time, is enormous.

I've submitted a similar talk to the OpenStack summit, with a slightly different perspective of educating new developers on what the process from idea to code landing in the OpenStack tree is. Hoping that gets selected as it should be a good talk, and give me an excuse to polish some of my code flow diagrams a bit more.

December 31, 2012 · Technology · personal

Google Goggles Magic

Last night a friend complained about a curry recipe gone wrong, so I decided to offer up the one I used to make with a certain amount of frequency. It's from a 1970s Time Life cookbook that I vaguely remember swiping from my friend Jehan in college. I took a picture on my cell phone to send it along.

The page is sufficiently stained with turmeric to realize how often it was made.

A little while later I noticed a Goggles Alert on my cell phone, it had scanned the image, and returned the following URL as a hit: http://littlechefapp.com/recipes/144571-chicken-curry-authentic#.UOGjR2JQCoM

Dead on. The future is pretty awesome some times.

December 28, 2012 · Technology · software

Mobile Browsing with Addons

One of the things that I liked a lot about Android 4.x is that Chrome was now a browser option. It meant that I got an almost Desktop quality browser on my phone and tablet. The almost bit has gotten pretty annoying of late though, because mobile Chrome doesn't support extensions.

About a year ago I converted over to using Lastpass, which means all my passwords for various websites are unique, and 12+ characters of randomness. Huge security improvement. However, it means every time I try to log in on the mobile web it's a multi step process to jump over to the lastpass App enter master password, enter it again to get username and password copied, jump back over to mobile Chrome, copy paste into the input fields, and finally log in. This is in contrast to the Desktop experience of feeding my master password ever couple of hours, and it automatically detecting and logging me into sites when I visit them. The mobile browsing experience feels clunky and broken compared to the desktop.

How I wish Mobile Chrome supported extensions, but it's not clear they are ever going to change that.

However, Mobile Firefox does. Over Christmas break I figured out that lastpass actually works in mobile firefox, and after a little configuration started using Mobile Firefox instead of Mobile Chrome on both my Nexus 7 and S3. The overall browser seems roughly the same speed (maybe slightly slower), however the experience is much better. You get Ad Block, which turns the web back into something vaguely sane, and my browsing experience is now akin to the Desktop. Enough so that I'll now use my Nexus 7 over my laptop for many browsing tasks.

Hopefully Google will eventually bring these features to their platform, but for now, the Firefox mobile strategy seems to be bearing some fruit, and reopening mobile browsers to innovation.

November 28, 2012 · Technology · science

Powerball Probability

If you win the Powerball jackpot today, there are a few things you should know. After beating the 1 in 175 million odds, you have an 11 in 175 million chance of being killed in your car after collecting the winnings. If you survive that, you have a 327,250 to 175 million chance of being robbed of those winnings, and a 805,000 to 175 million chance that new mansion will go up in flames, according to Eve Waltermaurer, associate professor of sociology at State University of New York at New Paltz.

Probability is a bitch some times. (From the Poughkeepsie Journal)

November 18, 2012 · Technology · personal

The Long View

It's good to step back some times and look at the really long view. Charlie Stross just did this with his new blog post on 2512, which provides a plausible look at what that world might be. I especially like the framing, about thinking what the world was like 500 years ago:

Five hundred years is a nearly unimaginable gulf from today's perspective. Five centuries ago, the Portuguese conquistadores were beginning their rampage through South America; Martin Luther was finishing his doctorate in theology and thinking about sin: the huge sequence of civil wars that racked Japan for over a century were raging: the Great Powers were still the Chinese empire and the Caliphate (although the latter was undergoing a shift in center of gravity towards Istanbul and the Ottoman empire). The great powers in Europe were Spain and Venice; the English speaking world was a few million barbarians occupying a handful of damp islands on the outer fringes of Europe. It's more than twice the historical existence of the USA to this date. Of our social institutions, very few survive from that long ago: the Catholic Church (and various orders and sub-groups within it), the Japanese Monarchy, and so on. A handful of universities, banks, and other institutions. The half-life of a public corporation today is about 30 years: ten half-lives out — 300 years hence — we may expect only one in a million to survive.

The whole post is definitely worth your time, but I do keep coming back to that half life statement. We take it for granted some time that organizations that exist today will be there tomorrow. But the reality is there is nothing magical about organizations, it's about the people. Things only get done because some decides to do them.

Contemplating the long view seems like an appropriate Sunday morning activities.

November 17, 2012 · Technology · software

A tale of two tech teams

The Atlantic just published an in dept look at the Tech team behind the Obama campaign. It's a little personality heavy, because they are trying to make tech interesting to the average reader, but putting that aside, there is quite a bit of detail on the team and tech structure behind the campaign.

Contrast that with what happened in the other campaign, where this was clearly not a core part of what they were doing.