Monday, August 18, 2008

Horrifically Unreliable Software: Commonplace

Relibility is the characteristic demonstrated by a system that does what its users expect. Insecurity is a type of unreliability. You can lose business because your calls are answered by someone with a terrible personal style on the phone just as easily as by someone who is, behind your back, referring your clients to an in-law. Security problems are just one type of unreliability, and any unreliability is bad for consumers of software or software-mediated services.

Unrelibility such as insecurity may result from fundamental design flaws, or from mistakes that expose systems to manipulation -- whether to halt the system, or to direct its energy toward some unintended purpose. The first kind is common as leaves of grass -- and, sadly, so is the second.

Types of Bugs: The Design Flaw
An example of fundamental design error is the mailing protocol of the United States Postal Service. The idea is simple: your destination address goes in the middle of the envelope, and the sender's address goes in the upper left-hand corner. Then you post the letter with enough stamps on it. If the postage in the upper-right-hand corner is insufficient, the postal service returns the mail to the sender.

Easy, right? What's to go wrong? Ha.

I know a parsimonious couple who kept in touch by mail, postage-free, by reversing the destination and source addresses. Since the postage (being utterly absent) was insufficient, the postal workers -- in keeping with clear postal regulations -- delivered each piece of mail to the address at the upper-left, which is where this pair of love-birds agreed to write their destination addresses. The United States Postal Service has no way to identify the sender other than by the labels users place on letters, and if you lie, you can get free delivery anywhere you want in the United States. It's a federal crime, and offensive, and I can't speak strongly enough against it, but I have no idea how a mail carrier could (within the regulations) fail to deliver your letter to your destination if you placed your destination in the sender's spot.

This is an example of protocol design error. It is an error because it is capable of being exploited to direct postal resources in ways unintended by the Postmaster, and causes all those who are playing according to Hoyle to subsidize the cheaters. It's not a mere implementation error, but a protocol design error, because the fix doesn't require a more reliable address reader, but a psychic determination whether a piece of mail was labeled by an honest person or a cheatre. This is a protocol design error because the psychic determination is simply not available to the Postmaster. The Postmaster's protocol relies on everyone being honest. That's the definition of a security problem waiting to happen.

An example of such an error in the software world was the Secure Shell version 1 (SSH1) authentication protocol, abuse of which could not be reliably prevented without breaking the semantics of the protocol. Fortunately, the OpenSSH project offers SSH2. Thus, it is possible to obsolete SSH1 and move on without much trouble. Just patching SSH1 and carrying on wouldn't work, though: it'd be like suddenly deciding that you want the return addresses on envelopes to be stamped by a notary before the post office will accept your mail: it'd utterly break everyone's expectation of how mail worked, and it'd gum up the works while people got used to it. Even if all postal employees were empowered to notarize your return address when you posted mail in person, you'd end up with problems bulk-posting from work, mailing after hours ... you could imagine Congress passing a law like this, but it'd be a replacement for how people addressed mail and it would be incompatible with the prior systems (in that to be compatible, you'd need to re-engineer your mechanisms for sending mail).

Types of Bugs: Implementation Errors
Another security and reliability threat is the implementation bug.

United States Postal Service mail-sorting machines process letters by pulling them horizontally past sensors. They don't know much about the difference between the left and right sides of letters, but they are very clear about relative horizontal position. The machines don't look in the upper-left corner for return addresses, they just look for the lowermost address for delivery. They then print a scannable bar code for the letter for further sorting and delivery of the letters.

I saw a custom envelope with a complex multicolored logo on the left side of the envelope, left of where the delivery address should go ... the logo/address combo took the whole left edge of the envelope, and included the upper-left space on the envelope. Any human observer would immediately recognize that it was the intent of the designer that the design was the logo and address of the sender. However, the address part of the pre-printed design was in the bottom of the left-edge graphic. The machines that expected sender's addresses to be left and above the delivery address would find the sender's address to be the lowest thing on the envelope and conclude it was the delivery address -- and print a bar code ensuring delivery to the mail carrier whose route included the sender's address. Not the delivery address. The bug here, strictly speaking, is with the fool who printed the envelopes. The protocol requires senders' addresses to be above and left of the delivery address. The fact that the bug wasn't exposed until automated sorting machines began printing scannable Zip+4 on the envelopes didn't mean the longstanding envelope design wasn't a bug. It was just -- until the service provider's sorting implementation changed -- a latent bug. Now it's a routinely fatal flaw, causing mail to be delivered to the sender instead of the recipient. The envelopes can't be used in the postal system.

Assuming that bugs are proportional to code in the same way that medication errors are proportional to filled prescriptions (even if they aren't equally subject to error rate variability based on ambient sounds), one would conclude that large code-bases would involve numerous bugs. The fact that a code base is old isn't much protection against the later discovery of bugs. As code sets are mated to enable complex interactions without special attention to security issues raised by message-passing between code bases, implementaiton bugs in each code base can be exposed by unexpected input from code created by other development teams -- and one can get things like QuickTime vulnerabilities exposed only on computers running Java-enabled web browsing software. Even systems designed to be secure -- that is, whose principal contributors consider proper behavior to be more important than saleability, performance, or other characteristics like pretty user interface graphics -- can with time and trouble be shown to have exploitable bugs. This doesn't make all systems equally dangerous. It does mean that special care is needed to minimize problems, or they will multiply past any possibility of controlling and you will end up like a Windows victim. Three, five, and seven percent infection rates are ridiculous in a world that pays for five-nines uptime. Security at Microsoft seems to be moving backward, even. Where is their pride? Maybe it was eaten by the Marketing department. After all, Windows chief Alchin says, despite the evidence, that you should buy Vista for the security.

(Given the impact of error reduction systems in pharmaceutical errors, one wonders to what extent code quality processes may improve implementation error rates.)

Why Pick On Microsoft?
Years ago on a mailing list dedicated to OpenBSD issues, I saw a firewall administrator post a question about his firewall logs filling with indecipherable connection attempts from machines at the domain hotmail.com. This was a few years after Microsoft acquired Hotmail, long enough that those laughing at Microsoft for using Unix to provide web services had encouraged it to "eat its own dog food" and deploy Windows NT on most of the boxes serving content from the Hotmail site. The firewall administrator posting the question assumed there was some nicety of proprietary data protocols that he was missing, causing his firewall logs to fill with apparent garbage. Hie question was along the lines of what do I need to be doing to allow people to connect properly to Hotmail now that it's running proprietary Microsoft software?

The answer was, entertainingly enough (if you weren't paying for the bandwidth), that the administrator wasn't doing anything wrong at all. The connection attempts weren't part of some esoteric proprietary authentication handshake, they were unsolicited connections by Microsoft-maintained machines, attempting to share with the firewall administrator the Red Code or Nimda worms with which Microsoft's servers had been infected. The entire bandwidth of Microsoft's Hotmail domain was available to attack the Internet. I actually had firewall logs of my own with the same garbage connection efforts from hosts in Korea and other places from which I ordinarily didn't receive traffic. Attacks outnumbered legitimate connection attempts. I was happy at the time to be using Unix and Apache on my web server, rather than IIS in an age of Nimda, Red Code, and so on.

Today, I see software voting solutions criticized for losing votes. This, in an era in which the freeware QMail guarantees not to lose an email. Yet, your vote can get "dropped" due to -- wait for it -- the action of antivirus software. Yes, voting machines running Microsoft operating systems seem to require antivirus software to prevent vulnerability to malicious attack by a subverted machine on the same network. Yet, as apparently-necessary as antivirus software may be in the world of Microsoft Windows, the cure may be worse than the disease. So crazy is the concept of using a horribly insecure proprietary basis for something like public voting tools that it's been selected for special treatment by top Internet educators. The direct link to the cartoon illuminating the idiocy of suffering dropped votes because of antivirus software necessitated by an insecure-by-design development platform is right here.

Microsoft's decision to make things behave in unreliable ways in order to achieve objectives like backward-compatibility or performance has left numerous customers with systems that behave poorly. Some folks think they're getting adequate value for their money, and I am happy for them. Others get something they didn't expect.

Getting what you paid for is one of the basic things one looks for in a satisfactory consumer experience. I don't think Microsoft -- which sells office software packages, operating systems, and server software for three- and four-digit sums of money -- delivers what people expected. So I lambast t

Reliability As A Valuable Feature
There's a reason Unix machines command a premium: the most common alternative often doesn't work.

Systems don't work by magic or marketing or the power of will. Whether it's a judicial system, a phone system, a computer system, or an educational system, there's plenty of opportunity for the system's output to be undermined not just by people who don't follow the system, but by systematic errors in the system. Careful examination of the policy actually advanced by a system is worthwhile whenever the output of a system is sufficiently important.

It should be unsurprising that human systems can be exploited to produce results unintended by the systems' designers. People have been arguing for centuries (okay, fine: millennia) how best to feed the masses, guard the borders, profit from foreign trade, collect taxes, and educate children -- and without clear answer. The systems that exist when a human institution is implemented are so complex not because they are designed to be incomprehensible (though some systems contain demarketing of this kind), but because human systems have variables and characteristics that aren't immediately obvious to observers. The systems are just not knowable to the extent that is possible in the deterministic environment of a computer system.

This is why failure in software is so objectionable. The fact that it is possible to turn the United States Postal System into a potential environment for system error and security problems does not surprise anyone. Modern computer systems ostensibly enable only specific, discrete users to employ the system, and to allow those users to acquire performance of only specific predefined instructions. With complete control in the hands of humans, one can expect a system that does what it's supposed to do.

We should all pay attention when software does not do what it's supposed to do, and vote with our wallets for software that enables us to do our work without fear.

The Good News
Systems now exist that help reduce code size by allowing programmers to leverage widely-used (and presumably battle-tested) code in the underpinnings of their software. On MacOS X, the Cocoa development environment allows not only re-use of pre-made programming "objects" but uses runtime linking – so that when you update your operating system, all the programs that rely on the objects underlying the Cocoa development environment all get a free upgrade on the next launch of the application. A single bug fix can improve reliability of an unlimited number of other developers' programs.

Trends like extreme programming, team coding, unit testing, and the like also tend to aid reliability by providing mechanisms to detect and improve incorrect and misbehaving code. While some of these trends involve investment of time at the front end of software design – and may be neglected for that reason by low-quality vendors – users who detect well-behaved, reliable applications can make sure to tell their friends and leave good feedback. Maybe one day, when even document standards are interchangeable, we can have competition on the quality of the code rather than on the lock-in of users fearful of losing access to prior-made documents stored in undocumented proprietary formats intended to ensure other software vendors can't poach customers.

Inability to share documents with the people with whom you do business isn't a feature, it's a bug, and we need reliably interchanging document formats to allow users to open documents as they expect as surely as we need software that does what we expect.

No comments: