ilovett

Subversion Checksum Mismatches

January 3rd, 2006

This entry was published quite a while back, so it may no longer be accurate. Reader beware!

Earlier today I needed to make a big big, very big SVN commit so I thought it would be smart to take a backup of the repository in case anything went wrong. And what do I get? Something going wrong, in the form of this weird message:

* Dumped revision 7.
svn: Checksum mismatch on rep '1wx':

Just the sort of thing that makes you wonder whether maybe Subversion wasn't the way to go after all.

This is probably only a problem you'll run into if your repository is still using the BerkeleyDB storage format. Mine is, because that's what you get on Dreamhost. Fortunately, a solution for svn checksum mismatches exists. It goes like this:

  1. Copy your repository
  2. Do a dump of the db/representations file:
    db_dump -kp representations > representations.dump
  3. Quoting from the mailing list message: "The dumpfile will contain lines like:
     ((fulltext 1 7 (md5 16 \fa\85\a5\e3\bdN7\95\03\e8\baq0\ad\9cn)) w) 
    Edit the part after '(md5 16 ' to 16 repeats of \00 - the all-zero checksum matches anything. "
  4. Reload the edited representations file back in using db_load:
    db_load representations < representations.dump
    
  5. Run svn dump again and hope it runs to completion.

In the process of figuring this process out, a couple things confused me:

  • dp_dump/db_load. On Debian systems like Dreamhost's, you don't want to use these. Instead you want to use db4.2_dump or db4.2_load. Don't ask me why the Debian way involves version numbers in the executable name. You'll know you're in trouble though if you do db_dump and it complains about not understanding the -k option.
  • Careful editing of the representations file is vital. A missing ) will spoil everything. Likewise for getting those 16 instances of \00 in place. The line I was editing was a little different than the example given on the mailing list. I just searched for '1wx', found the md5 line, and hoped for the best. Working on a copy of the repository is a really good idea here.
  • The mailing list solution talks about hunting down the file/files involved in the mismatch in order to identify possible corruption. In my case, the revision giving the error was one where I had deleted a bunch of files. So I skipped all that. I'm not quite clear on what you would have done if a file really was corrupted. Never revert back to that revision maybe?
  • This error wouldn't have been a problem if I wasn't dumping the repository. Better to have caught it now rather than later of course, but I could have kept on making commits and I think everything would have continued to work fine. I don't think this solution necessarily fixes the problem, it just seems to trick Subversion into believing it doesn't exist.

Michael Stephenson of wordmap.com wrote in with some additional comments on spotting the offending checksums:

The first thing which was slightly different for us is that some of the md5sums are of the form:

xs4
((fulltext 3 1n6 (md5 m\96\13\d0\16\82\82'uL\02\81\18\feG\d3)) 4 11yx)

Which can safely be turned into:

xs4
((fulltext 3 1n6 (md5 16 \00\00\00\00\00\00\00\00
\00\00\00\00\00\00\00\00)) 4 11yx)

And secondly some of the checksums were on delta lines, eg.

28p3
((delta 0  (md5 16 W\7f!\f2x7E\d7\04\02\05\97\c4\ec(?)) (1 0
((svndiff 1 0 4 2yc1)...

Which can just go to:

28p3
((delta 0  (md5 16 \00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00))
(1 0 ((svndiff 1 0 4 2yc1)...

This seems to work and is pretty obvious but might help save someone else a few minutes.