Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Space Age Compression

Name: Anonymous 2011-04-27 11:07

I'm working on a new type of compression algorithm and have run into some difficulties.

Basically what I do is this:

Given raw data: D
I compute the MD5 hash: H = MD5(D)
At this point I've compressed by a factor of Size(D) / Size(H) (which for any data larger than 128-bits is significant)

My problem is uncompressing the data.

Currently what I'm doing is as follows: (keep in mind i'm working with bit values here)

G = {0}[trillion]

while (MD5(G) != H) G++;

I know this has the problem of unintentionally hashing the preceding zeros but the main issue I'm running into is runtime.

So far 20-30 bit lengths have run in feasible time but when I tried to run this with a 32-bit guess, it took several minutes to verify.

My MD5 implementation may be inefficient but it's also possible that this algorithm suffers from a runtime of 2^n.

Any tips? Or is this simply an unfeasible compression algorithm?

Name: Alfe 2011-06-17 6:09

Just in case you meant that honestly or anybody else took it for being a great idea:  This won't work.  Uncompressing is impossible.

Reasoning: There are 2^N different bitstrings with N bits.  To compress these (i.e. in a decrompressible way!) you need to use a special property of the information called redundancy.  MD5 does not do this.  In fact, there are 2^(N-128) different bitstrings with 2^N bits which result in the same MD5 hash with 128 bits.  Your brute force attack will simply find any of these (after a VERY long time).  Since MD5 is just a hash algorithm, the bitstrings resulting in the same hash will look very different (that's one of the ideas of a good hash).

In fact, no just-mathematical/logical approach on compressing data can succeed because redundancy is always a domain specific property.  For us, it typically means repetition (e.g. of patterns).  But that must not always be the case.  For compressing the first one billion digits of the number pi the approach of searching for repetitions would not lead to anything.  Instead, compressing the digit string to the algorithm computing the digits would be a successful approach here.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List