Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Image Analysis?

Name: Anonymous 2008-05-02 0:10

Sup /prog/

I have a problem whereby I need to identify (with any degree of accuracy) whether or not an image is a resized version of another image. ``Resize'' is defined as a constant aspect-ratio downscale with (hopefully) a cubic sampling. I haven't been able to turn up any reasonable material on image analysis, which means I'm left to my own devices.

I think one viable option might be to consider the proportions of colors within the image. A ``color'', for my purposes, is defined as a set of three ranges (one range for each RGB channel). The algorithm would essentially go through each pixel, keeping a tally of how many of each color there are, then sort by occurance. Two images with similar proportions of the top 5 colors or so are flagged as identical.

The colors would necessarily be ranges because of the cubic sampling -- since one of the images is a downscaled version of another, the set of colors from the two images won't be the same. A pathological example: a large image consisting of alternating white and black pixels gets downscaled, the resultant image has gray -- a color which was not present in the original image.

I had some other weird ideas, like taking the diagonals, downscale one at runtime to fit the other, then compare the deltas.

I dunno. Anyone have ideas/suggestions/links?

Name: Anonymous 2008-05-02 7:58

>>10
Well, the problem with that is that it doesn't scale well. The set of images I'm wanting to compare weighs in at about 7000 files right now: sufficiently large that complexity differences are significant.

>>7
A friend of mine implemented such a system for his image collection, and said it worked fairly well; I'll definitely try it in the future if I need another heuristic.

>>8
This actually works really well and is fairly fast; I was able to generate the hash (I guess that's an acceptable term) of each image and compare it to every other image over half an hour (slowness due to 6000 images + forced indentation). Eventually this might bottleneck (due to the O(n2) nature of the algorithm) but whatever. It works fairly well right now:

http://img141.imageshack.us/img141/3497/12097293092bf9.png

The problem I've found is that mostly black-and-white images tend to be matched with each other (since they get reduced to a puddle of gray). One suggestion someone made was to convert the RGB values into HSV to check to see if it's a shade of gray. I'm not quite sure what to do even if I can discern the black and white (ie, manga/hentai scans) from the rest of the images. I think I'll just consider it a non-issue.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List