Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

image matching, detect redundant thumbnails

Name: opie !!ypjq+FkbQO5UwcT 2009-12-18 14:01

Do you know a program that quickly tells if two thumbnail jpg files are of the same image?

I had hoped all those often reposted images like Admiral Ackbar would have the same data in the thumbnails each time so I could detect and filter out those images, but no such luck. The image data often differs even if the images look the same.

This old thread
   http://dis.4chan.org/read/prog/1209700628
mentions reducing images to 4x4 pixels and then comparing them. Have any of you found a better way? The imgseek program would be too slow to compare all the thumbnails from a 4chan page with my archive of already-viewed images.

For the heck of it I'll include my program to extract the image data section from a jpg. It may also interest you to know that
   xv -nolimits -gamma 1.1 -expand 3 -vsmap
blows up a thumbnail pretty well.


//A JPEG file has sections starting with byte 0xFF (possibly several 0xFF bytes)
//then a byte giving the type of the section.
//Search JPEG given as arg1 for the sections and list the types and the
//byte position of the section start.
//Also dump the compressed image data section to file arg1_DATA.
//This is to help in comparing JPEG files by stripping the other sections away.
 
#include <stdio.h>

main (int argc, char **argv)
   {
   unsigned char buf[1024];
   int   byte_count;
   FILE *fp;
   int   in_data_sectionTF;
   FILE *outfp;
   unsigned char prev_byte;
   char trash[1024];

   fp = fopen (argv[1], "rb");
   if (!fp)
      {
      printf ("Error opening file.\n");
      exit (1);
      }
   strcpy (trash, argv[1]);
   strcat (trash, "_DATA");
   printf ("%s\n", trash);
   outfp = fopen(trash, "wb");
   if (!outfp)
      {
      printf ("Error opening output file.\n");
      exit (1);
      }

   byte_count  = 0;
   prev_byte = '\0';
   in_data_sectionTF = 0;
   while (fread (buf, 1, 1, fp) == 1)
      {
      byte_count = byte_count + 1;
      if (prev_byte == 0xFF && buf[0] != 0xFF)
         {
         printf ("marker %x at position %d\n", buf[0], byte_count);
         switch (buf[0])
            {
            case 0xC0: printf ("   Start of frame N, given as parameter to marker.\n"); break;
            case 0xC1: printf ("   N indicates which compression process.\n"); break;
            case 0xC5: printf ("   NB: codes C4 and CC are NOT SOF markers.\n"); break;
            case 0xD8: printf ("   Start Of Image (beginning of datastream).\n"); break;
            case 0xD9: printf ("   End Of Image (end of datastream).\n"); break;
            case 0xDA: printf ("   Start of Scan (begins compressed data).\n"); break;
            case 0xFE: printf ("   COMment.\n"); break;
            }
         if (buf[0] == 0xDA)
            {
            in_data_sectionTF = 1;
            }
         if (in_data_sectionTF && buf[0] != 0xDA && buf[0] != 0x00)
            {
            //reached end of data section;
            in_data_sectionTF = 0;
            }
         }
      if (in_data_sectionTF)
         {
         //Dump data byte to data file.
         fwrite (buf, 1, 1, outfp);
         }
      prev_byte = buf[0];
      }
   fclose (fp);
   fclose (outfp); 
   }

Name: Anonymous 2009-12-20 12:06

Why would you need to recompile? Just update the shared library and you're good to go. Stop trolling.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List