« April 2003 | Main | June 2003 »

One axis of blog is not enough

Posted by misuba at 08:41 AM

Because great minds think alike, some others are arriving at some of the same threads I spewed forth in a half-digested state in my post to LazyWeb (and, by definition, to my own blog) about auto-categorization of blog posts. Before those others overtake me entirely, I'm going to try to explain in English what the hell I was talking about back then, and why it matters.

Your computer organizes its information predominantly in one way: hierarchically. Since the pre-Cambrian era, or at least since Unix, operating systems have given you a conceptual tree of files and directories, into which you "get" to sort all your stuff. The graphical user interface popularized by Apple in the 80's sort of added half a dimension to that: you could move file icons in a directory, and possibly remember what was what based on graphical location. But computers mostly still give you one choice about how to find things: "where" they are in a virtual space. You might get text searches of varying efficacy, but only sometimes. On your PC, the path of least resistance is the world of folders.

Big thinkers in the user-interface field all agree that this is a horribly limited and stilted way to have to work with your stuff, but they often disagree on what would improve matters. Jef Raskin thinks we should go radically spatial, putting everything into one big map; David Gelernter thinks the answer is to organize everything by time; Donald Norman wants to break out every kind of task you do into its own device so you won't have to store all that stuff on the same drive anyway. Each of these approaches is probably the perfect one - for someone. But the spatial, folder-tree approach is perfect for someone, too. The rest of us need something else, or we need one way to find certain kinds of things and a different way for others. Computers are the most versatile devices made by humans; why limit an OS to just one way of looking at information? (Alan Cooper is the UI professional who's come the closest to integrating this realization into his work.)

It's the same with weblogs. We're throwing everything, really pretty much everything that smart wired people are thinking about, into this one form, this one particular kind of clay pot. It's organized by time, with the newest stuff at the top, like a compost heap. That's a very handy way, for certain tasks, of looking at things. It makes blogs somewhat like human conversations, which is a big part of why they've taken off. Human talk has limits, though: you get caught up in something new, you forget stuff that's more than a minute old, the talk goes in circles. Computers can help us with that... but weblog software doesn't do the best job of connecting things up. You've always got Google, it's true. What else can we do?

Most weblog software these days lets you categorize your posts. You have to make up the categories beforehand, then you can stamp them onto your posts, sometimes more than one category per post. This can be used to turn a blog into what almost looks like a full-featured site with its own custom content management tool, or at least to help your readers focus in on something they're interested in. But what if you don't cut your content into categories the way your readers really want and need? Or what if, due to the inherent hassle of web interfaces and some residual lameness in blog software, you don't bother to categorize your posts at all?

Into this mix, we toss a new element: Bayesian filtering. Mr. Bayes has had a guest spot on this program before - his probabilistic equasion for parsing chunks of data into separate piles, based on human-labor-intensive labeling of other chunks of data, has become the basis for the world's leading filtering software for unsolicited commercial email, AKA spam. When certain words have a tendency to pop up over and over again in text that you've labeled as undesirable, the presence of those words in new emails can identify them as something you've already said you don't want. But people who haven't used a flavor of spam filter other than that provided in Mozilla or OS X Mail - or, I suppose, people who haven't used POPFile - don't realize that you can have more than two states, good and bad. You can have multiple labels and apply the Bayesian math just as nicely.

Specifically, you can use the labeling you do when you, say, categorize your posts to your weblog... and generalize them. Then, if the people who wrote your weblog tool have the skill and the will, your manual categorizations could get spread around, peer-to-peer style, and mixed in with other people's categorizations until the tools have some sense of what words really signify what, Obviously, their best guess as to what you're talking about would be correctable by human hands - it'd have to be. The help of computers, though - especially at coming up with categories that you didn't know applied - would eliminate a big barrier to adequate librarianship of blogspace. That barrier, once felled - that is, the barrier of having only one way to look at the world of our online minds - could open up a whole new playground of emergently interacting ideas.

The tools for doing this kind of thing are probably already on their way. For instance, shortly after its recent purchase of Blogger's parent company, Google purchased a company called Applied Semantics, which was working on software "which understands, organizes, and extracts knowledge from websites and information repositories in a way that mimics human thought and enables more effective information retrieval." The official story is that this tech will be applied to Google's text advertising, but it's impossible not to imagine how such tools couldn't leach into the world of blogs. Google has already brought its text ads into Blogger's free-hosted weblogs by analysing the text of their entries, with some success. Given the evidence, posted in my original post to LazyWeb, that my radical thoughts have already occurred to others, well... the folks at Google ain't stupid, and they've got to be be thinking this way.

So I'm not worried. More importantly, I'm not writing speculative code in my off hours.

link here | 1 comments | Other blogs commenting

A million empty lecture halls and nary a satellite link

Posted by misuba at 01:19 AM

I've been saying this for years: Blogging Sucks For Conversations. I like this explanation of the problem, but I'm not sure how I feel about the proposed solution. And I'd add this to the statement of the problem: isn't it odd that two bloggers, who are like lecturers each in their own lecture hall, have conversations with one another that you have to bounce around to all these different pages to read?

link here

Pretending to be a journalist again

Posted by misuba at 06:36 AM

My interview with Stewart Butterfield, about his company's upcoming online game Game Neverending, is up at Mindjack.

I am also an alpha tester on another online game, Yohoho! Puzzle Pirates. That doesn't make me a journalist, but it does make me cool.

link here | 3 comments | Other blogs commenting

Back to the present