Extracting Data from Semantically Weak Sources

25 Jul 2008
Posted by cgp

So how do you extract data from semantically weak sources?

Did the presentation as a pirate. Interesting.

1. Invent things like tag libs? I guess so, they are useful...
2. You rate the data.
3. User reviews?

Semantics is wrapping meaning around data... Such as wrapping that a red light means food. Wrapping

Learning is a change in behavior based on previous experience.

Machine learning, algorithms that change based on input.

Punctilio, a classic case of machine learning.

tags alone=weak

tags+people=weak (because they do the AI) :)

See what is similar between the data sets being tagged/used to generate successful queries.

** Next presenter (thinner guy, not a ninkja or a pirate)

Freebase takes information and makes it relational. Freakin' interesting.
Wikipedia data is dirty.

wiki2xml is parser which extracts data from wikipedia.

wex will let you run sql statements against wikipedia essentially.

you can mine for just about anything.