Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Image Analysis?

Name: Anonymous 2008-05-02 0:10

Sup /prog/

I have a problem whereby I need to identify (with any degree of accuracy) whether or not an image is a resized version of another image. ``Resize'' is defined as a constant aspect-ratio downscale with (hopefully) a cubic sampling. I haven't been able to turn up any reasonable material on image analysis, which means I'm left to my own devices.

I think one viable option might be to consider the proportions of colors within the image. A ``color'', for my purposes, is defined as a set of three ranges (one range for each RGB channel). The algorithm would essentially go through each pixel, keeping a tally of how many of each color there are, then sort by occurance. Two images with similar proportions of the top 5 colors or so are flagged as identical.

The colors would necessarily be ranges because of the cubic sampling -- since one of the images is a downscaled version of another, the set of colors from the two images won't be the same. A pathological example: a large image consisting of alternating white and black pixels gets downscaled, the resultant image has gray -- a color which was not present in the original image.

I had some other weird ideas, like taking the diagonals, downscale one at runtime to fit the other, then compare the deltas.

I dunno. Anyone have ideas/suggestions/links?

Name: Anonymous 2008-05-02 9:24

>>19
Of course. But splitting strings with SQL is a bitch -- AFAIK it requires a stored procedure or some other crazy shit, which I cbf'd to write. It's easier to dump the stored string, unserialize it, etc.

It would take an ungodly amount of time to process when new images are added if those computations weren't cached. Even more so since it's written in Python.

>>17-18
#! /usr/bin/env python

import MySQLdb
import Image
import os

THRESHOLD = 10

if __name__ == '__main__':
        print( 'Connecting to database...' )
        db = MySQLdb.connect( user='lolrite', passwd='fake', db='its-hosted-locally-anyway' )
        c = db.cursor()

        print( 'Fetching images missing vectors...' )
        c.execute( 'SELECT img_id, img_thumb FROM 4s_images WHERE img_thumb!=%s AND img_vector IS NULL ORDER BY img_id', ( '', ) )
        missing_vectors = c.fetchall()

        print( 'Generating vectors...' )
        num_images = 1
        with_vectors = []
        for img_id, img_path in missing_vectors:
                print( '\tProcessing image %s (%d of %d)...' % ( img_path, num_images, len( missing_vectors ) ) )
                num_images += 1

                im = Image.open( img_path )
                th = im.convert( 'RGB' ).resize( ( 2, 2 ), Image.ANTIALIAS )

                v = reduce( lambda a, x: a + [ x[0], x[1], x[2] ], list( th.getdata() ), [] )
                v_str = ','.join( map( lambda x: str( x ), v ) )

                with_vectors += [ ( img_id, v ) ]

                c.execute( 'UPDATE 4s_images SET img_vector=%s WHERE img_id=%s', ( v_str, img_id ) )

        print( 'Fetching list of all vectors...' )
        c.execute( 'SELECT img_id, img_vector FROM 4s_images WHERE img_vector IS NOT NULL ORDER BY img_id' )
        images = c.fetchall()

        print( 'Performing analysis...' )
        num_images = 1
        matches = []
        for img_id, img_v in with_vectors:
                print( '\tFinding matches for #%d (%d of %d)...' % ( img_id, num_images, len( with_vectors ) ) )
                num_images += 1

                for img2_id, img2_v in images:
                        if img2_id >= img_id:
                                break

                        total_diff = 0
                        img2_v = map( lambda x: int( x ), img2_v.split( ',' ) )

                        for i in xrange( 0, len( img2_v ) ):
                                diff = abs( img_v[i] - img2_v[i] )
                                total_diff += diff

                                if diff > THRESHOLD:
                                        break
                        else:
                                print( "\t\tFound a match! Diff: %d (#%d)" % ( total_diff, img2_id ) )
                                matches += [ ( img_id, img2_id, total_diff ) ]

        print( "Adding matches to database..." )

        for match in set( matches ):
                c.execute( 'INSERT INTO 4s_similar( img1_id, img2_id, diff ) VALUES( %s, %s, %s )', match )

        print( "All done!" )

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List