25 Apr 2012

Converting Images of Documents to Near Scanned Quality

I’m working on a project at the moment where we’re handling large quantity of documents that have been photographed in less than ideal circumstances. These photographs are high resolution color photos of typed and handwritten documents that need to be transformed into scan quality documents.  The steps involved are:

  • Convert color to grayscale (128 shades or less)
  • Remove background noise and speckles/flecks/etc
  • Reduce size of images for easier distribution and lower bandwidth usage (currently ~40 GB for set of images) After a few days of working on the problem and learning about the technologies involved… here’s the script I came up with, along with notes about sources and inspirations (note much of the heavy lifting is done by ImageMagick and   Fred’s ImageMagick Scripts[gist id=2494870]

25 Apr 2012

Label every image with its filename in one fell swoop

Here’s a one off Ruby script to wrap up some ImageMagick labeling magic. This script takes all the JPG images in folder and applies the file’s name, as a label, to the bottom of the image. [gist id=2495001]