Name: opie !!ypjq+FkbQO5UwcT 2009-12-18 14:01
Do you know a program that quickly tells if two thumbnail jpg files are of the same image?
I had hoped all those often reposted images like Admiral Ackbar would have the same data in the thumbnails each time so I could detect and filter out those images, but no such luck. The image data often differs even if the images look the same.
This old thread
http://dis.4chan.org/read/prog/1209700628
mentions reducing images to 4x4 pixels and then comparing them. Have any of you found a better way? The imgseek program would be too slow to compare all the thumbnails from a 4chan page with my archive of already-viewed images.
For the heck of it I'll include my program to extract the image data section from a jpg. It may also interest you to know that
xv -nolimits -gamma 1.1 -expand 3 -vsmap
blows up a thumbnail pretty well.
I had hoped all those often reposted images like Admiral Ackbar would have the same data in the thumbnails each time so I could detect and filter out those images, but no such luck. The image data often differs even if the images look the same.
This old thread
http://dis.4chan.org/read/prog/1209700628
mentions reducing images to 4x4 pixels and then comparing them. Have any of you found a better way? The imgseek program would be too slow to compare all the thumbnails from a 4chan page with my archive of already-viewed images.
For the heck of it I'll include my program to extract the image data section from a jpg. It may also interest you to know that
xv -nolimits -gamma 1.1 -expand 3 -vsmap
blows up a thumbnail pretty well.
//A JPEG file has sections starting with byte 0xFF (possibly several 0xFF bytes)
//then a byte giving the type of the section.
//Search JPEG given as arg1 for the sections and list the types and the
//byte position of the section start.
//Also dump the compressed image data section to file arg1_DATA.
//This is to help in comparing JPEG files by stripping the other sections away.
#include <stdio.h>
main (int argc, char **argv)
{
unsigned char buf[1024];
int byte_count;
FILE *fp;
int in_data_sectionTF;
FILE *outfp;
unsigned char prev_byte;
char trash[1024];
fp = fopen (argv[1], "rb");
if (!fp)
{
printf ("Error opening file.\n");
exit (1);
}
strcpy (trash, argv[1]);
strcat (trash, "_DATA");
printf ("%s\n", trash);
outfp = fopen(trash, "wb");
if (!outfp)
{
printf ("Error opening output file.\n");
exit (1);
}
byte_count = 0;
prev_byte = '\0';
in_data_sectionTF = 0;
while (fread (buf, 1, 1, fp) == 1)
{
byte_count = byte_count + 1;
if (prev_byte == 0xFF && buf[0] != 0xFF)
{
printf ("marker %x at position %d\n", buf[0], byte_count);
switch (buf[0])
{
case 0xC0: printf (" Start of frame N, given as parameter to marker.\n"); break;
case 0xC1: printf (" N indicates which compression process.\n"); break;
case 0xC5: printf (" NB: codes C4 and CC are NOT SOF markers.\n"); break;
case 0xD8: printf (" Start Of Image (beginning of datastream).\n"); break;
case 0xD9: printf (" End Of Image (end of datastream).\n"); break;
case 0xDA: printf (" Start of Scan (begins compressed data).\n"); break;
case 0xFE: printf (" COMment.\n"); break;
}
if (buf[0] == 0xDA)
{
in_data_sectionTF = 1;
}
if (in_data_sectionTF && buf[0] != 0xDA && buf[0] != 0x00)
{
//reached end of data section;
in_data_sectionTF = 0;
}
}
if (in_data_sectionTF)
{
//Dump data byte to data file.
fwrite (buf, 1, 1, outfp);
}
prev_byte = buf[0];
}
fclose (fp);
fclose (outfp);
}