🕷 zenspider.com

by ryan davis

Looking for the Ruby Quickref?

Flay detecting identical and similar code

Published 2008-11-18 @ 01:01

Tagged flay

I said the following in my previous post about flay:

Soon I will write up why flay kicks towelee [sic], PMD, and everyone else’s tool in the ass… But I think the above is a damn good start.

There are going to be two classes of tools for this type of work: string-based tools and AST-based tools.


Towelie is Giles’ entry into the fray of ruby developer tools. Towelie is an AST-based tool that uses ParseTree to detect duplicate code at the method level.

There aren’t nearly as many available to us rubyists, so it is worth a peek… But, after closer inspection, I just can’t compare the two. Yes, towelie attempts to detect duplicated (but not similar) code, but that is where the similarities end, so the comparison doesn’t seem fair.

Consider it an exercise for the reader.

PMD’s CPD, Simian, Same, etc.

On the CPD page it says that the current version “was rewritten by Steve Hawkins to use the Karp-Rabin string matching algorithm”. In other words, it is a string-based tool. This family of tools have completely different objectives than flay. They’re normalizing whitespace,stripping comments, and looking for duplicate code. That’s great… it actually finds lots and lots of good stuff and I used to use same when starting with new clients.

But… (there is always a but…)

These tools would (could!) NEVER point this out:

  Matches found in :defn (mass = 32)
    A: ../../drawr/dev/lib/drawr.rb:38
    B: ../../png/dev/lib/png.rb:181
  A: def write(file)
  B: def save(path)
  A:   File.open(file, "wb") { |f| f.write(to_s) }
  B:   File.open(path, "wb") { |f| f.write(to_blob) }

especially considering the code is actually written like this:

  def write(file); File.open(file, 'wb') { |f| f.write to_s }; end


  def save(path)
    File.open path, 'wb' do |f|
      f.write to\_blob

This is something that a duplicate code string-based scanner just can’t do. Even the simplest change like {} vs do/end or changing your line wrapping on long conditionals will be missed… lost… ignored.

So, flay has the ability to go beyond simple copy/pasted code and detect real candidates for refactoring. That is something that the java folks don’t seem to have (yet) for some reason. Certainly the foundation set by PMD means it is available, but it isn’t there yet.