16th November 2009 ::

Score Overflow ::

Well, that was weird.

So over last year I've been pretty much wasting a bunch of my time posting technical nonsense to the technical Q&A site (and New King Of Google) Stack Overflow.

I used to frequent the mailing lists for this sort of thing, but the days of CSS-d and WD-L being populated by actual web pioneers are gone, replaced mainly by low-level Fix My Site For Me stuff as far as I can see. Like most of everything to do with e-mail, mailing lists seem to have gone downhill, to be replaced by the web and social networking. Bring on the death of e-mail.

The user rating and moderation system at Stack Overflow is quite interesting. It seems almost deliberately set up like a Massively Multiplayer Online Role-Playing Game, or perhaps one of those awful click-farming Facebook games. In its system of ‘score’ (the classic attraction of watching a number slowly get bigger) and ‘badges’ (like gaming's ‘achievements’) it is directly mimicking the reward systems of RPGs. All that's missing is the ability to Level Up and become a Level 90 lawful-evil-aligned Java-hacker, leading a party to the Lair of the ÜberSkeet for some instance-based Visual-Basic-orc-bashing.

All this conspires to give it the properties gamers might describe as ‘addictive’. The way its systems coax you to take part and compete to provide help to others for no gain of your own is an insidious trick to snare players — I mean users — and increase traffic. But, hey! At least it's not as horrible as nasty old Experts-Exchange, huh?

OK, there's a lot of low-level dullness at SO too, but somehow it's easier to cope with on a user-moderated site. Nonetheless, the same old questions as on the lists do come up again and again.

  • At number three it's a new entry for “Why is my (Python, JavaScript, ...) function getting the same value of the variable every time around the loop?”, by Clint Forloop and the Closures;
  • up one place to number two this week it's the classic “I've found a bug in (programming language)! Numbers don't add up properly!” by The Floaters, that's on the IEEE 754 label;
  • and for the third week in a row at fabulous number one it's “What's the regular expression to parse this load of HTML?” by Rusty Markup Jr.

So after coming back from the pub to look at some SO stuff before bed (see what I mean about the addictiveness) to find yet more bleedin' questions about parsing HTML and XML with regex I ended up posting this silly internet-meme-based cry-for-help answer out of sheer frustration. Somehow, over the weekend it has ballooned out of all proportion, ending up with an absurd quantity of votes — like twenty times more than any of the actually-useful answers I've ever posted there, and apparently by quite some distance the highest-voted thing on the site.

I did say it was weird. (And goodness knows what Metafilter or the other civilian sites that linked the thing make of all the dumb memes and SO in-jokes.)

GamesMaster sez: if you can't get past Level Three of Stack Overflow, why not try the cheat mode? Stop trying to come up with long, considered answers considering the pros and cons of different approaches for complicated topics. Everyone hates those. Post daft rubbish full of weird Unicode combining diacriticals instead. Everyone loves Unicode! Well, except for PHP users. But they don't count. They're probably too busy trying to parse HTML with regex anyway.

The perils of strings ::

Still, if just one coder can be persuaded by this strange outbreak of publicity not to use regex to parse HTML/XML/XHTML, that's... er... still about another 15 million to go. Why is it that everyone want to process markup with regex? I don't especially want to regurgitate jwz's infamous “Now they have two problems” regex quote again (...mainly because Jamie bears the responsibility for Unix Netscape, a browser that made my early webdev life utterly miserable...) but it is bang-on.

I think there's a deeper problem here, a fundamental mismatch in human brains. It's not just regular expressions, it's string processing in general: we seem to naturally perceive bodging strings together and regexing them apart as something easy, something we understand, as natural as sticking English words together. It's the Lure Of Strings: when both a string-bodging approach and a clean object-style interface are both available, coders overwhelming jump for the quick and dirty string hack.

But it's deceptive. The coder's mind happily glosses over the fact that inserting text into a non-plaintext context requires delimiters and escaping to distinguish the text from its context, or that even simple markup languages are invariably full of tricky little corner cases that defy simple string indexing or even regex. (And neither HTML nor even XML come close to qualifying as ‘simple’ markup languages.)

This is the brain malfunction that has led to fifteen years of "WHERE name='"+name+"'" SQL-injections and print "Hello, "+name+"!" cross-site-scripting security holes. And whilst we are making some progress in fighting SQL injection by frightening programmers into using parameterised queries on pain of having Russian hackers turn their sites into viagra-and-virus pits, the vast majority of web applications still have XSS.

Languages that emphasise string bodging over Doing It Right don't help. PHP's lack of parameterisation in the standard library set us back years, and the default of unescaped HTML in its output structures continues to spawn broken apps. ASP.NET's controls and Java's taglibs are helping a little, but there are still holes. (ASP.NET infuriatingly treats some properties as containing plain text and some as HTML, thus confusing everyone.)

[This will generate angry flames] ::

Anyhow, SO's a great way to stay in touch with what tools, technologies and application areas people are working with in the real world. It does have its biases, naturally. Microsoft's .NET technologies are acknowledged to be a little over-represented, but much worse is the sinister cult of jQuery users.

Ask any question with JavaScript in it — any question at all, whether a general page manipulation question where jQuery might be helpful, or a trivial syntax query to which the answer is a tiny JavaScript one-liner — and your question will be showered with praise and inducements promoting jQuery... answers which the rest of the jQuery mafia will then vote up whether they addressed anything in the question or not. I swear, soon they'll start posting jQuery promo to FORTRAN questions.

I thought this image would be An Hilarious Joke-humour for SO users, but once again reality has beaten me with real questions like “How do I round a number to the nearest integer using jQuery?”. Some users seem to think you can and must write everything in jQuery without touching normal JavaScript at all. Maybe I should code up a plugin to allow you to write $.Math.round(n)?

Yes, many jQuery fans are incapable of writing any JavaScript unless it has a dollar in it. Almost always, the ‘best’ way of finding out whether a checkbox is ticked is given with a straight face as:


— as obviously input.checked just doesn't do enough selector-parsing busywork to be really modern. And that's about the least mangled example, paling in comparison to amusing snippets like this one to establish which row number tr is:

$('#mytable tr:not(#'+$(tr).attr('id')+'~tr)').size()-1

Clearly just saying tr.rowIndex is no good. It's too fast, or too readable, or something. (Actually the real reason it's a no-go because it's part of the standard DOM, which everyone knows is “full of [unspecified] browser bugs” which using jQuery magically manages to fix, except when it doesn't.)

The Lure Of Strings is here once more. jQuery users want to work with strings, not DOM objects. Strings are easy! You can make a selector string $('#'+id) so much more easily than having to type out boring document.getElementById(id), and it'll only be a bit slower, and it'll only fall over when there's a dot or colon in the ID, and that probably won't happen, unless maybe it does, so that's OK.

Plus, you can make a div by saying $('<div>'+message+'</div>') instead of having to muck around with document.createElement('div') and text nodes. Hooray! Only... hang on. You've not escaped that HTML, and have probably just created a cross-site-scripting security hole, only on the client side this time. And after you'd spent so long cleaning up your PHP to use htmlspecialchars on the server-side, too. What a shame. Ah well, no-one really cares about correctness or security, do they?

jQuery's not wholly to blame for this. After all, the innerHTML property has been about for years, and already proved more popular than DOM. But the library certainly does encourage that style of coding.

I do have some serious reservations about a lot of the stuff jQuery does, but that's a long, complicated and boring tirade for another time. Preferably a time when I've got some actually-useful JavaScript library stuff on my software page instead of the manky 8-year-old scripts aimed at IE5 and Netscape bleedin' 4 which live there at the moment... but that would involve updating this site. Which it turns out I'm really not very good at. Tsk, eh?

PS. I don't much like Prototype, either.

21st April 2009 ::

Spyware: end of an era ::

That's it; “spyware” is done. Zango (né 180solutions), the last of the major spyware vendors, is no more.

It is unclear at this point what parts of Zango live on after being purchased by blinkx. I suppose it's possible that it's a case of moving the system to new operators whilst shrugging off some of the old debts. But whether blinkx have the means and intent to keep the adware network up and running or not, it's the last nail in the coffin for the concept of adware. If 180, having swallowed up its other extant competitors (Hotbar and CDT) can't make it work, the adware model is doomed.

And predictably so. It has always been blatantly obvious that adware does not offer an acceptable value proposition for any rational customer: years of irritating ads and loss of control over your computer and privacy, in exchange for some crappy smilies or video you can almost certainly get elsewhere for free? Not a goer. Even if you could be persuaded to install it, the obvious course of action is to remove the adware again immediately afterwards.

So adware cannot be viable on legitimate terms. The only way to make it pay is to cheat: to hack ‘consent’ by hiding your agreement where no-one will see it; to target inexperienced or non-rational customers who won't understand the agreement (in particular children); and to install without any consent at all, through silent bundling or in the worst cases install by exploit.

180 and the rest of the industry did all of these; when finally threatened into curbing the worst of this behaviour by legal threats and the FTC, they were no longer capable of keeping the model afloat.

Ken Smith, previously CTO, offers a mixture of apologia and self-justification in a post-mortem at his blog. It's interesting to see how he views 180's actions, but some of the comments seem a little disingenuous to me.

We partnered with some people that we should never have partnered with. During that time period, my best estimate is that something like 4% of our installs during that time period were completely silent, i.e., were the result of affiliates using browser security holes

I think this is a massive under-estimate. Perhaps only 4% of affiliates were directly using browser security hole exploits themselves, but there were also many intermediary affiliates that installed 180 software without any notice or consent, having themselves previously been installed silently. For example, a 180 variant was installed by NeoToolbar, which was itself installed by CoolWebSearch variants, which were installed by exploits. Another was installed by MediaMotor (later shut down by the FTC), which was installed by SmartBot exploits. And so on.

Plus there were many installs that were simple bundles rather than exploits, just with no notification or consent that 180 software would be installed. I'd say this also counts as silent; the figure must be much higher than 4% even ignoring the issue of whether a few paragraphs of legalese in an enormous EULA squashed into a tiny scrollbox counts as meaningful consent.

Empirically, I used to see 180 software all over the computers I was looking after, with the owners unaware of what they had done (if anything) to receive it. But more recently*, I've not come across a single one. Whilst some of that will have come from greater customer awareness of the dangers of installing software with bundles attached, the majority has to be up to the move away from silent installs.

(*: I'd say “since 180 cleaned up their act” but that would imply their modern practices were completely clean. I don't necessarily consider that the case. Some promotional materials are still misleading and/or seemingly aimed at children, and much of the content used as bait by affiliates is worthless at best — and at worst, as in the case of the fake torrents, illegal. But it is undeniable that 180's behaviour has been much better than in their heyday.)

Back in 2003, spyware meant "a program that your wife puts on your computer to track what you're doing." By 2005, it was synonymous with "adware", which was an awkward if reasonable description of Zango's business model. How did this happen?

Oh, that's an easy one. You get called spyware because your software spies.

180 software included long-term clickstream tracking functionality (which though not immediately personally-identifiable can easily become so through leakage of information in URLs), and, much worse, deliberate sniffing of e-mail addresses and personal registration details from multiple third-party programs (eg. Outlook Express).

You may take offence because the primary purpose of the software is not spying, and demand to be called adware instead as ads are the main thing the software is designed for. But your software leaks information; security researchers have no way to verify what happens to that information after it leaves the computer. If you don't want to be considered spyware, don't spy. Don't do something incredibly suspicious like sniffing out the user's name, e-mail address and ZIP code if you don't intend to use it.

(It's amusing how, over the years, “no spyware!” has become a phrase only ever used by spyware. Like farts, where He Who Denied It Supplied It, “we're not spyware!” is an immediately obvious red flag that something quite smelly is present.)

And very few of those programs had any sort of notice, QA, or presence in Add/Remove Programs: in contrast, Zango had all of those things.

Only in the latter days. Earlier versions of Zango (originally ‘nCase’) were installed with no notice, served unlabelled ads, occasionally crashed, and had no Add/Remove function; indeed, some resisted being manually uninstalled. You had executables with completely random names, dropped into different folders to be difficult to find. And when removed, many versions left an ActiveX installer in-place, allowing nCase to be re-installed without any notice by any web site. Malice or simple incompetence, there's no excuse for this stuff.

You're crying because your reputation was terrible in the later years of Zango despite having improved things since the early days. But there should never have been any later years, because the actions of the early days were simply unlawful. 180solutions's software did untold damage to machines it had no authorisation to access; if the FTC or other authorities had been more on the ball in 2002-2004 you would have been shut down already.

I'm working on another startup

I wish you luck — if it's a theology startup.

If it's adware again, not so much.

After all, they [the anti-spyware vendors] were in it to make a buck as well

No, not originally. The first wave of anti-spyware tools — OptOut, Ad-Aware and Spybot Search and Destroy — were completely free.

They were originally written by amateurs to satisfy the malware removal needs that were not being met by traditional anti-virus vendors (who ignored the problem for years, perhaps afraid of litigious adware companies like your own).

Anti-spyware in decline ::

Things are a bit different now of course. As described in an earlier post, the category of adware barely exists any more. Instead we have the annoying but relatively harmless bundled toolbars from Google, MS, Yahoo and Ask on one side, with the trojans, botnets and fake anti-spyware apps installed by PPI fraud and Russian exploits on the other, not even attempting to look legit. I guess if you're going to do exploits, the proper trojans pay better than adware.

There's very little in between (you still see the Mirar toolbar installed by the trojans occasionally, but that's about it), so there's little left that the anti-virus people aren't already targeting. Generally unsuccessfully targeting, but that's another story. Some anti-spyware apps have themselves drifted out to target this more obvious malware, with the result that there's little difference between them and the traditional anti-virus stack. There is no longer really any place for dedicated anti-spyware.

But the anti-spyware companies now really are out to make a buck, and it's hard to take a broken business model that's still making you money and just decide to pack it all in. This is leading to some strange bedfellows. AVG and SpySweeper have been bundling variants of the Ask toolbar (previously considered spyware), and now both Ad-Aware and SpywareBlaster are pushing users towards TrialPay, one of those nasty schemes where you have to sign up for dodgy offers you don't want, then cancel them and hope you don't have to spend the next twelve months on a helpline trying to stop your credit card being billed. Which may not be spyware, but it's certainly exactly the same kind of marketing unpleasantness that the adware companies loved to push. It's a bit sad to see previously impeccable software teams go this way.

To be honest I don't even know why SpywareBlaster still exists; its ActiveX CLSID-based blocking was relevant to the unsolicited commercial software threats of yesterday, but does absolutely nothing against the exploits, trojans and rogue anti-spyware that dominate today. The parasite-detector script on this site worked in a similar way, but has since been withdrawn as it's of almost no practical use any more.

(I'll probably remove the ‘parasite’ section itself completely soon. Certainly the advice about anti-spyware software there is woefully out of date. Perhaps I'll replace it with an archive of the old, much-plagiarised parasite list for historical interest.)

Plug out ::

Another point release of Firefox was released today, fixing more security holes. IE was a week ago with the usual monthly round-up. It seems we've completely given up on the idea of having a secure browser, and are happy to let it update once a month instead. It's a bit sad, but that's the price of having such a complex and expanding feature set, I suppose.

But still, the majority of successful exploits recently aren't even targeting the browser, but plug-ins. Quicktime, Real, Windows Media and especially Java have been targeted by the Russian malware gangs in the past, but the current favourite by a mile is the Adobe Reader PDF plug-in.

Perhaps the reason these attacks are so successful is that while browsers are understood to be net-facing software it is important to keep updated, users don't really understand what they have with plug-ins.

And why should they? If you install Adobe Reader, it says nothing about installing a PDF plugin. You wouldn't naturally expect it to, because let's face it, no-one wants to read PDFs squished up in a little embedded box in a web browser. It's an unpleasant experience even when it works, and it adds a whole load more attack surface to the web browser. Plus it occasionally crashes.

Previously, to be attacked, you'd have to explicitly download and double-click on a PDF you wanted to read. Now, you're at risk just by going to a web page. And not just known ‘dodgy’ web pages like porn and warez; any number of high-profile sites have been hacked, or included adverts from networks that are hacked. For example for the last week, foxnews.com has been happily sending its readers to PDF exploits. (Yeah, I know, Fox News is way dodgier than porn or warez ect ect jolly laughs and so on.)

When you install Microsoft Office, do you really want to be able to view Excel spreadsheets in your web browser? That's what the installer quietly does for you. You'll probably never actually need that feature, but it does still make you vulnerable to various Office Web Components exploits that have come out over the years. Now you have to worry about keeping Office up-to-date.

When you install iTunes, do you expect arbitrary web pages to be able to fire up the iTunes application? They can now, so if there are security holes in it you're at risk. You have to keep it updated... that and the Quicktime plug-in it will also install behind your back. You can use the Apple Software Updater to do it, but how can you trust that when it has been known to also bundle further unrequested applications like Apple's Safari browser?

Java is even worse. It has installed the Google toolbar, the Microsoft Live toolbar, and even OpenOffice.org through its updater. Whilst you or I can at least understand and decline this ‘handy’ offer, what about the less able users, the ones we set computers up for? I can't turn Java's update feature off, because there are new security holes in Java all the time, that need fixing if the plug-in is accessible to web pages. And I can't leave it turned on because then my users will probably click OK through the updater to get it done, and end up with a load of software they didn't want.

Then I'll have to deal with the support call when .DOC files unexpectedly open in this weird new thing they don't know how to use instead of familiar MS Word. Not that OO.o is bad or anything, but a full office suite, installed as an update to Java, to which it is completely unrelated? What idiot in the Sun asylum thought that was a valid upsell?

No, the only workable alternative is not to install Java. As a server-side scripting language it's fine, but do we really need it on the desktop? Probably not. Certainly not as a plug-in: applets are all but dead. If there's a Java app you really need, you can install it without the plugin for Firefox/Opera/Chrome, at least.

When installing Office, no Web Components please. With Adobe Reader, there's an option hidden away under the Options->Internet->‘Display PDF in browser’ you can untick to get rid of it. Or you could always use an alternative PDF reader, one not so bloaty and ugly as Adobe's has become, that doesn't have a plug-in (or gives you the choice to decline to install it). On Windows there's PDF XChange, and Sumatra (which is nice and lightweight and open-source, but the scrolling interface is a bit quirky). Of course there's also Foxit, but that also installs a PDF plug-in without asking, and on some installers also tries to bundle a third-party toolbar, so screw that.

Real, Quicktime, Windows Media, Silverlight, Director... the average user can probably do without these. That leaves only Flash, which is unfortunately widespread enough that avoiding it is impractical. Flash brings with it a privacy issue too, in that it stores history and saved tracking information from any website without any means to limit this behaviour, short of completely turning storage off, which makes many sites completely fail to work. There is no Firefox-style ‘allow cookies until the browser is closed’ option, and even if storage is disabled the history list is kept forever. This is quite obnoxious behaviour from Macrodobe.

(Firefox users, luckily, have the FlashBlock add-on available, which is also handy for stopping impolite sites and adverts making unexpected noises at you and pegging your CPU load at 100%. So that's nice.)

Unfortunately the pre-built PCs what you get from them there shops invariably take a maximalist approach and install all the plug-ins you might ever want (as well as any amount of other toolbars and crud you definitely don't). Uninstall it! Uninstall everything you don't need! Turn on, tune in, and plug out!

Obligatory silly photos ::

Yeah, I was going to do the normal thing and put amusing photos from Japan in here, with the Engrish and hilarious cultural misunderstandings and that. But I forgot to take any. Sorry about that.

So here's what's around here in Germany instead. These here are the current adverts for the state lottery. In comparison to the usual “Win LOADS of money and shit! Wheeee!”-style adverts I'm used to these are incomprehensibly dull and respectable. How is this going to encourage anyone to throw away good money on pointless gambling?

The slogans include “Zufriedenheit durch Kompetenten Service”: translated, “Satisfaction through competent service”. German businesses love tedious, anodyne slogans like this, and particularly the word ‘Kompetent’. I do love that the German companies merely advertise themselves as ‘competent’ rather than earth-shatteringly great. It's a lot more honest, and less of a disappointment when our IT project comes in late and over-budget than if we'd claimed to be out-and-out superstars. But is this really suitable for a lottery?

Surely lottery advertising should be recklessly aspirational, compelling you to fantasize about winning all that money? This advert merely says, “Coo, imagine buying something from a shop and the transaction going through smoothly. How fun.” And how much competence does it really take to sell a lottery ticket? You take the numbers, put them in the machine, and ring up the purchase on the till. Even I could do that without coming in 100% over-budget and with half the features not working.

Another slogan: “Vertrauen durch fachkundige Beratung” — “Trust through knowledgeable consultation”, accompanied by a checkout chap in an identical pose as the girl from the other advert, and in what looks like a suspiciously similar shop (despite the claim to be representing real sales outlets in different towns... not sure it's getting my trust there).

Again, how much specialist knowledge do you really need to advise customers on playing the lottery? What advice could he possibly give out to increase the customer's chances of winning a completely random game? Or does he actually give out useful information like “Don't play the super extra game, the probability of winning is slightly less than the main draw”, or, to be more honest, “Don't waste your money playing any of these games; your chances of winning are vanishingly slim and the average returns are much worse than almost all other forms of gambling.”? Certainly that advice would earn my Trust, but I don't think it's quite what Lotto were looking for.

Also here's some quality graffiti I found nearby. The artist exhorts us to “Fuck de Police”. What I particularly like about this piece is that the young fellow, having initially mis-spelled the word ‘the’, has realised his mistake — perhaps in an English lesson centered around Fucking the Police at school the following day? — and has come back to his masterwork to make the necessary corrections, carefully bringing both different colours of spray paint with him to make it nice and neat.

A sterling effort which really helps him get his message across. We see this sentiment so clearly expressed, with a friendly-looking smiling self-portrait*, and can't help but think: “Yes! You are right, anonymous German slogan-spraying chap. The Police can often act in a reactionary and thuggish manner, as the incidents at the recent G20 protests clearly demonstrate. They are not to be trusted, and we should indeed Fuck them.”

Unfortunately, this is still not really the correct spelling: it should, of course, be “Fuck tha Police”. It's a shocking indictment of the German school system that today's students are insufficiently well-educated about the works of NWA. German graffiti artists act almost like, y'know, they forgot about Dre. Tch!

Finally, here is a picture of a brand of coffee I hadn't seen before in the shops here. It is called ‘Minges’ — titter.

You know, minges. Like lady-front-bum-coffee! tee-hee!

It funniest thing since BUM brand crisps.

You could make a joke here about ‘coffee creamer’, or something similarly hilarious. But I'm not going to.