Tuesday, June 10, 2008

Some Factz About Powerset

Powerset is a new search engine that uses natural language processing (nlp) to return results that are both more varied and more accurate than google's. NLP technology extracts and integrates meaning from linguistic structures and the relationships between words, instead of treating all text as strings of unrelated key terms. It's pretty rockin'.

For example, the queries "What did Hillary say about Bill?" and "What did Bill say about Hillary?" return a fairly similar set of somewhat useful results in google. Scanning them, it is evident that they were brought up by the words "hillary," "bill," and "say." No synonyms or verb conjugations or permutations were returned. Although these sentences contain the same keywords, they don't at all mean the same thing -- but google treats them more or less as if they did.

The results of the same queries in Powerset reflect the subtlety of the tool. It seems to grasp the "aboutness" of the question, and brings back entries that contain more than verbatim pieces of the query. Verbs like "claim" and "vow" and "state" are brought back as variants of "say."

One of the coolest things about Powerset is the list of "Factz" that appears along with the traditional list of links. Factz take the form subject-verb-object and can add up to a wonderful list of simple but unexpected sentences describing the subject of the query.

As with any complex process, its mistakes can be even more impressive than its successes because they reveal how much work it's actually doing. I entered the query "What did Britney do?" and learned that: "Britney speared songs and samples." The parser is so enthusiastic that it went right ahead and parsed her last name! It's endearing, like the errors of overgeneralization children make, but also indicative of the powerful linguistic processes humming beneath the surface.

And although it couldn't tell me where Jimmy Hoffa is, it does know who killed Laura Palmer.

Friday, June 06, 2008

Robot Love

In the San Francisco BART stations, the regular arrival and departure of the southbound trains are announced in the prerecorded, carefully inflected glint of a machine's voice pitched to that of a human woman. The north- and eastbound trains are hailed by the equally smooth tones of an electronic man.

From 4:00 am to about 1:00 am the two voices take turns reading the following haiku to each other.

Now approaching
nine-car Fremont
train platform
one.

Eight-car Daly City
train
arriving in
twelve
minutes.

Ten car Dublin-
Pleasanton train arriving
in
three
minutes.

I used to wonder if the same conversation continued during the four-hour mid-night gap when the lines go dark.

Eight-car Millbrae
train
arriving in
three hours twenty-
two minutes.

Or if they used the time to discuss other things.

Did you see that fallen-
faced man down
that Miller light
like it was sun-
light?

Two pigeons trapped
at Montgomery flapped
the dust
right off
the floor
'til it shone.

Or perhaps I'm the only one who imagines companionship means talking all night and they are happy to let a bit of silence course down those still tunnels.