« A million empty lecture halls and nary a satellite link | Home | Philosophy now, baby! »

One axis of blog is not enough

Posted by Mike Sugarbaker at 8:41 am on 5/22/2003

Because great minds think alike, some others are arriving at some of the same threads I spewed forth in a half-digested state in my post to LazyWeb (and, by definition, to my own blog) about auto-categorization of blog posts. Before those others overtake me entirely, I’m going to try to explain in English what the hell I was talking about back then, and why it matters.

Your computer organizes its information predominantly in one way: hierarchically. Since the pre-Cambrian era, or at least since Unix, operating systems have given you a conceptual tree of files and directories, into which you “get” to sort all your stuff. The graphical user interface popularized by Apple in the 80’s sort of added half a dimension to that: you could move file icons in a directory, and possibly remember what was what based on graphical location. But computers mostly still give you one choice about how to find things: “where” they are in a virtual space. You might get text searches of varying efficacy, but only sometimes. On your PC, the path of least resistance is the world of folders.

Big thinkers in the user-interface field all agree that this is a horribly limited and stilted way to have to work with your stuff, but they often disagree on what would improve matters. Jef Raskin thinks we should go radically spatial, putting everything into one big map; David Gelernter thinks the answer is to organize everything by time; Donald Norman wants to break out every kind of task you do into its own device so you won’t have to store all that stuff on the same drive anyway. Each of these approaches is probably the perfect one - for someone. But the spatial, folder-tree approach is perfect for someone, too. The rest of us need something else, or we need one way to find certain kinds of things and a different way for others. Computers are the most versatile devices made by humans; why limit an OS to just one way of looking at information? (Alan Cooper is the UI professional who’s come the closest to integrating this realization into his work.)

It’s the same with weblogs. We’re throwing everything, really pretty much everything that smart wired people are thinking about, into this one form, this one particular kind of clay pot. It’s organized by time, with the newest stuff at the top, like a compost heap. That’s a very handy way, for certain tasks, of looking at things. It makes blogs somewhat like human conversations, which is a big part of why they’ve taken off. Human talk has limits, though: you get caught up in something new, you forget stuff that’s more than a minute old, the talk goes in circles. Computers can help us with that… but weblog software doesn’t do the best job of connecting things up. You’ve always got Google, it’s true. What else can we do?

Most weblog software these days lets you categorize your posts. You have to make up the categories beforehand, then you can stamp them onto your posts, sometimes more than one category per post. This can be used to turn a blog into what almost looks like a full-featured site with its own custom content management tool, or at least to help your readers focus in on something they’re interested in. But what if you don’t cut your content into categories the way your readers really want and need? Or what if, due to the inherent hassle of web interfaces and some residual lameness in blog software, you don’t bother to categorize your posts at all?

Into this mix, we toss a new element: Bayesian filtering. Mr. Bayes has had a guest spot on this program before - his probabilistic equasion for parsing chunks of data into separate piles, based on human-labor-intensive labeling of other chunks of data, has become the basis for the world’s leading filtering software for unsolicited commercial email, AKA spam. When certain words have a tendency to pop up over and over again in text that you’ve labeled as undesirable, the presence of those words in new emails can identify them as something you’ve already said you don’t want. But people who haven’t used a flavor of spam filter other than that provided in Mozilla or OS X Mail - or, I suppose, people who haven’t used POPFile - don’t realize that you can have more than two states, good and bad. You can have multiple labels and apply the Bayesian math just as nicely.

Specifically, you can use the labeling you do when you, say, categorize your posts to your weblog… and generalize them. Then, if the people who wrote your weblog tool have the skill and the will, your manual categorizations could get spread around, peer-to-peer style, and mixed in with other people’s categorizations until the tools have some sense of what words really signify what, Obviously, their best guess as to what you’re talking about would be correctable by human hands - it’d have to be. The help of computers, though - especially at coming up with categories that you didn’t know applied - would eliminate a big barrier to adequate librarianship of blogspace. That barrier, once felled - that is, the barrier of having only one way to look at the world of our online minds - could open up a whole new playground of emergently interacting ideas.

The tools for doing this kind of thing are probably already on their way. For instance, shortly after its recent purchase of Blogger’s parent company, Google purchased a company called Applied Semantics, which was working on software “which understands, organizes, and extracts knowledge from websites and information repositories in a way that mimics human thought and enables more effective information retrieval.” The official story is that this tech will be applied to Google’s text advertising, but it’s impossible not to imagine how such tools couldn’t leach into the world of blogs. Google has already brought its text ads into Blogger’s free-hosted weblogs by analysing the text of their entries, with some success. Given the evidence, posted in my original post to LazyWeb, that my radical thoughts have already occurred to others, well… the folks at Google ain’t stupid, and they’ve got to be be thinking this way.

So I’m not worried. More importantly, I’m not writing speculative code in my off hours.

link here

One Response to “One axis of blog is not enough”

  1. Nathan Young Says:

    Opera’s email client carries this idea into email sorting. There are a few ways it’s different that all add up to a really new way of looking at mail.

    First, while it lets you do filters like other email clients, the filters are non-exclusive, in other words, the filter doesn’t “move” the message, rather it creates a “View” and messages can be in one or many views.

    It automatically creates views for each person you’ve received messages from. It recognizes mailing lists and creates views for them. It has a list of active contact, people with whom you’ve exchanged messages recently.

    You can create a folder and move messages into it. Those messages can still appear in other views.

    You can save keyword searches as views. It’s pretty cool.

Leave a Reply

Current projects

  • Op: TTMTMFD
  • Ficsuit reboot (and I don't just mean when the server falls over)
  • Sugarbaker's Omnibus of Strange Amusements
  • OgreCave Audio Report (podcast about tabletop gaming)

"How to Speak Gibberish."

The answer is here. I have not edited, and will not edit, the page you find there. Please do not contact me about this subject; emails/comments about "speaking gibberish" will be ignored/deleted.

Contact Mike

Contact me

Colophon

Powered by WordPress

Best viewed in a standards-compliant browser, such as Mozilla Firefox

Syndicate this site (RSS, Atom)

Technorati Cosmos

Me, Elsewhere