As more and more IDEs grow automated refactoring support, a painful scenario repeats itself: You automatically rename a bunch of files, and then get to tell your VCS about each and every single one.

Enter guess-renames. guess-renames uses a smart, accurate, automatic algorithm to set the record straight with your VCS of choice, even if you changed the contents of the file after renaming it.

Essentially, guess-renames does what it says on the box.

Where can I obtain this incredible, almost eldritch piece of software?

You can download the code from Bitbucket. From there, you can download a source tarball or clone the development Mercurial repository. I plan to do a formal release on PyPi in the near future.

You will need Python to use guess-renames. I’ve tested guess-renames against both Python 2.6 and 2.5, and it should work on Python 2.4 — if it doesn’t please file a bug.

What VCSes does this support?

Currently, Mercurial and Subversion are supported.

This doesn’t support my VCS. You suck!

No, you suck. guess-renames is written with extensibility in mind. To add support for your VCS of choice, you write one subclass and fill in four abstract methods. It really is easy.1

Who wrote this? What license is it under?

guess-renames is written by me, Colin Barrett, based on an algorithm originally written by Aaron Bentley for Bazaar.2 It is released under the GNU GPL v2.3

How do I use guess-renames?

Download guess-renames and run the install script with the command python Afterwards, you may simply run guess-renames from any directory within your source code. guess-renames will automatically determine what version control system is in use and take the proper actions.

Additionally, for Mercurial, guess-renames provides an extension that you may enable by adding the line guessrenames.hgext= to the extensions section your .hgrc file. This will add a -g option to hg addremove and hg import, which will invoke guess-renames automatically.

How does guess-renames work?

guess-renames operates on two groups of files: missing and unknown. These are often identified by the ! and ? characters, respectively, by VCSes. The terms “old” and “new”, while not totally accurate, serve as a good analogy. In a nutshell, guess-renames tries to find a corresponding old file for each new file, if any.

First, guess-renames computes all edges for each file. An edge is defined as two consecutive lines in a file; so the first edge is lines 1 and 2, the second lines 2 and 3, and so forth.

Given a new file, guess-renames looks at each edge and — after noting which old files also contained that edge — builds a score for each old file. The score is based on how many edges those files share in common, and how rare each edge is — the more common an edge, the less it contributes to the score; this keeps guess-renames focused on unique information that identifies a file and not on boilerplate text.

guess-renames simply then chooses the best matching pairs of new and old files. Easy!

  1. These simply tell the guess-renames engine about the lists of missing and unknown files and the contents thereof.

    I also recommend you fork my guess-renames repository on Bitbucket, to make it as easy as possible for us to exchange code. 

  2. Hat tip to Jean-Francis Roy for informing me about Aaron’s work. 

  3. I’m not a fan of the GPL. Aaron Bentley’s original implementation is GPL and while my implementation does not share any code with Aaron’s, I did learn a lot from his implementation. So in the interest of playing nice, I chose to release guess-renames under the GPL.