Climbing Mount Nofuckingway
July 21st, 2004
As always, the more you do, the more there is to do. I have, technically, added “this is a small change” to Cornucopt as of yesterday; I was happily implementing all this stuff where the small change would overwrite the previous change, as long as the previous change was made by the same user. But I hit some sort of wall. The code was done, it looked like it’d work and everything… I don’t remember if it was thinking about the Recent Changes RSS that did it to me (I hate it when posts change slightly and my aggregator decides to tell me about them all over again!), or my doubts over what really caused the database explosion of a couple months ago, or what. I was just seized by dread and had to rewrite it.
So now all it does is keep your change out of Recent Changes. The non-bloat of the database will have to be the responsibility of a Kept Pages policy. That MeatballWiki link is an interesting read, but the concept can be a little tough to grasp. I’ll quote the salient bit:
We chose to version everything, but only keep revisions around for a limited timespan, say a week or a month. That is, destroy all revisions older than the timespan. The timespan will naturally depend on the amount of traffic to a site, the slower the traffic, the higher the timespan.
This scheme is not sufficient, though, as an attacker can destroy a page that hasn’t been touched in several months by merely editing it. After all, all its prior versions have been automatically erased, leaving you with only the latest, vandalized version.
Consequently, we add a little twist: timestamp the revisions not when they are created but when they are replaced. So, the current version of the page is not in the revision history, but when I make a change, the current version gets timestamped then and entered into the history. Then the new edit becomes the current version. […]
Equivalently, and more simply, you keep all previous revisions created during the timespan, plus one more.
That last sentence probably hews closest to how I’ll actually be coding it. But who the hell cares how I’ll actually be coding it.
Anyway. This policy seems the best compromise between posterity, security and efficiency. But it made me think of a lot of other things that are pretty important to have, which I haven’t worked on yet:
- IP banning (I have some GPL code pretty much ready to go for this, but there need to be admin pages to let you cope with it)
- Edit throttling (this is different from “shields up,” when you don’t take any edits at all - this is when one dork is flooding you with edits and you’d like to cool him off. Ideally you should be able to throttle individual users or IPs.)
- The fact that some users are going to want to keep every version of everything, which reopens the possibility that, to save bits, I should store things as diffs rather than full text, which reopens that technical can of worms
- Maybe I should just store small changes as diffs? I’d need to teach Version History and whatever I build to look at all changes to a page how to cope with diffs, but I think that’s it.
I also suspect there are more login bugs. I really don’t like those.
Entry Filed under: Implementation