Sunday, January 11, 2009

Datamining your mind

Did you notice that you can now vote for or remove search results in Google? Maybe not, they're grayed out boxes next to the results. At this point, it only affects your results - but such potential in bulk!

If you've ever searched for Kentucky artists (and I have lately, for a grant), did you really want ALL the tattoo artists in the state? Change it to Kentucky Art and you get lots of names. I can now remove them from my results, or promote the really good sites.

Handy. Definitely.

What's really exciting is what (I imagine, but the Google Guys are no dummies) is going on behind the scenes. By associating the terms I'm using what I'm doing with the results, I'm in effect voting on the semantics, separating the types of art. For now, it's just mine, but multiply by millions and you may, fairly soon, be able to search character and get results clustered by typographic, moral, and cartoon even when those terms aren't used in a page.

An enormous semantic thesaurus, using current terms that we really use and not LC faceted terms from the 50s or 60s. The ultimate (for now) use of the hive mind, without the hive realizing it, not unlike the SETI screensaver that ran int he background analyzing data.

Think of groupsourcing as distributed computing. Not all the results are gold, but like ReCaptcha, the cumulative results are good. And good is good enough, right?

Saturday, January 3, 2009

Excuse me, Google, but....DUH!

Okay, I get that Google is dealing with millions of documents at breakneck pace, with workers who are not lawyers. But can we get just one rulke in writing and post it on everyone's monitor?


I just came across this record, and stopped in disbelief. I know there's a budget crisis, I know that federal agencies are making some very secretive and strange deals (see the Smithsonian/cable deal), but restricting content by falsely declaring an unconstitutional copyright ought to be actionable.

Congressional Record: Proceedings and Debates of the ... Congress

Congressional Record: Proceedings and Debates of the ... Congress‎
by United States Congress - Law - 1933
[ Sorry, this page's content is restricted ]
Snippet view - About this book - Add to my library - More editions

Come on guys. I understand that you don't have time to investigate the status of works that should be PD by date, but might have been restored under some later amendment, though I don't understand why you would scan works in copyright under a claim of fair use and restrict PD.

But what possible challenge to fair use could there be in the Congressional Record? How could a corporate body even establish standing to the public record?

Guys, they're our laws, and our congress. Yours too, of course, but not just yours. So can we just write an algorithm that says "This is PD, and we're sorry, we really goofed" for the Record?