July 27, 2005

One thing leads to another

It's amazing how one project turns into a whole series of new ones. It started with some data in an Access database that i needed to convert to a more usable form. I figured a well-written query should do the trick. It turned out that i was going to have to do a lot of data clean up and find ways to uncover the "extra" data buried within other values. I then decided to write a .NET app to do all the processing. Now i had a whole bunch of part numbers and descriptions that i had to classify into standard groups. With several thousand part numbers sitting in my new tables, i didn't want to have to do this by hand. I thought i might try some Bayesian filtering to automate the process. While typically associated with filtering spam, this technique can be used to classify as well. I've never written an app like does anything like that before so i thought i would look at what's already out there. There's a library called bow and an app called rainbow (how cute) that seemed interesting. Each is available as a set of files containing the C code necessary to compile it yourself. I'm still not that comfortable on my Mac command line, but now i've found myself trying to install fink and wondering how the heck i got here.

I really just need to stop now and get back to my homework.

Posted by Matthew at July 27, 2005 08:47 PM
Comments