I've said it before, and I will say it again: its not the data you have, its the data you can ignore that counts.
Discrimination, that is.
The engineers like to talk about signal-to-noise ratios, while ignoring the discrimination problem of what is  signal, and what is  noise  as a trivial problem. Maybe back in the days of regenerative receivers and tuned filters, but now days we are fed a more heady and abstract kind of  data, and digging information out of it is big business. Really big business.
Lets be clear, Google did not become the monolith it is by delivering advertising, and fucking up YouTube comments. It worked because it delivered a superior signal-to-noise, back in the clumsy old days of hand-edited directory s and .
But another of my pet pleasures is the way that bigger organizations lose efficiency, as internal parasitism and broken feedback loops interfere with the swelling colossus's ability to maneuver, or perhaps, even move.

Sow hat do these ideas have to do with today? Well, today is the day I searched for "poster sized map of internet xkcd hilbert" in Google Images.
Google Images has always been bad at delivering anything except Raquel Welch as the damn engine does not know what the pictures actually are, so it guesses (badly) based on local text. So there is always a few surprises served, even with Ms. Welch, but today raised my eyebrows.

I was served 384 images. I find it difficult to believe that many images correspond to the conjunction of the terms "Poster" "Sized" "Map" "Internet" "Xkcd" "Hilbert"
The first seven images are bang on.
The eighth is associated (honest mistake).
Nine through Seventeen are good. Eighteen though...eighteen is the xkcd gravity well map on a Pinterest page. Okay, another simple mistake.
Twenty is a subway map. Odd.
Forty is 'Princess Bride Monopoly'. Ten percent of the way in, and we are way off into Madness Space.
A info-graphic map of Iceland's fishing history.
A layout of all 892 unique ways to partition a 3x4 grid.
An Arlo & Janis strip.
Map of the George Washington bridge.
A flyer for Veterans Day 2014.
Picture of Solar Power array.
Japanese cutaway diagram of Predator(tm).
Elemental information for Gallium, Germanium and Arsenic.
Remarks concerning Einstein reconciling the Bible with Science.
Cutaway of a Star Destroyer.
Cover of Enders Game.
A alpine lake.
Four women standing by a wall (not porn)
Antenna repairs on the Empire State Building.
Bad sketch of a whale.
Various quotes and pithy remarks vis-a-vis faith and science.
The triangulation of France.
A Goya painting.
A graphic for the movie Interstellar.
the first cat picture does not show up until 300 images in.
First demotivational at 320.
Harry Houdini.
Thai cuisine.
Raquel Welch with a scooter.

Its the last six images I'd like to concentrate on:
5: Man in headband with an intense expression
4: Single frame of xkcd comic about overthinking
3: thumbnail                             \
2: different thumbnail                } all from Pinterest
1: Yet another tiny thumbnail  /
0:A black and white photo of an extremely large man.

The last image  that has anything to do with mapping the internet is at position 132: a non-Hilbert graph of 2.5 million reddit comments.

So its pretty clear that its just serving random images after the first few. So why cut it off at 384? Why not just give me a unending stream of things?

Putting on safe search drops the feed to 367. I am not about to figure out with images were filtered out. I only found one that could be even loosely considered pornographic, unless you were a mullah, or allergic to Raquel Welch.

Now Bing gave me even less images, and were as usual, even farther off target.
Yahoo was about midways, and went off the rails just as hard.

Weirdly, Webcrawler, which is one of my old favorites from the late nineties, delivered the most number of random images (900+), but was right on target in the first page.

Duckduckgo, being a aggregator, was better than both, not as good a google, but did give me the 'Legendary Pokemon Last Supper'.
This is something I didn't realize I needed , until I discovered it existed. Isn't there a Japanese word for that?

No comments:

Post a Comment