Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

New and revolutionary data comression scheme!

Name: Anonymous 2009-06-13 17:17

Infinite compression?

I've always was interested in how compressed files worked and why the compression factor is so low.
The entropy explanation isn't something i would accept without tinkering with the data. The idea of my compression algorithm(currently under development) is to
use algebraic properties of files to express the data in more concise way.
The thing might sound trivial, but its implementation is not.
Normal compression works by splitting FILE and finding redundant pieces to express them in minimum volume.
Imagine a FILE read and converted to arbitrary precision integer. Now consider all the myriad ways to generate said integer. Sounds difficult? Not so much.
1.lets take a Prime Number,e.g. 2 and raise it to power closest to our integer, e.g. 20000. Note the difference from the (FILE-result).
2.Get the smallest difference with the powers available,
and proceed to next step:
3.If the difference is negative: find 2 raised to the power
of X which would close the gap,by substracting it from #1
If the difference is positive just add 2 with closest power to the difference .
The end result could be something like
2^6+2^5-2^3+2^1-2^0=Integer=FILE
Its can be improved further by using other prime numbers powers with the idea of expression taking the least space.
The same thing can be accomplished with arbitrary length floating point numbers like 2^123712.1282 which would converge faster,but will require some fixing to convert to a stable binary result.
Posted by FrozenVoid at 15:37

Name: Anonymous 2009-06-15 15:05

>>112
Honestly, the problem is using base 2. It's already taking up too much space. Look, the number 8 is one digit but you need 4 bits to store it! (#b1000 in some Scheme dialects).

The first thing you want to do is convert everything to base 32, that is,
[code]0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V[code]
where `Z' represents 35 * 36[sup]0[/sup]. So let's take a number like
10001. This is just `H'. Now things get tricky.
Take a number like 010000111001110010111101111110011100. This parses directly as "GRUNNUR" (except for the leading zero, which is troublesome but I'm working on that), which happens to be an Icelandic word for "foundation." Therefore all we need to do is store a small numeric reference to a set of dictionaries (i.e., store the number [i]n[/i] for the [i]n[/i][sup]th[/sup] dictionary), and then the number of the word in that dictionary. Say, "GRUNNER" is the 50th word. The dictionaries would be open source, of course, so everyone could reference them.

I know what you're thinking, though: we'd need so many dictionaries! That's true, so we'll just reference them by the nearest prime strictly less than it and an offset, and compress even the [i]reference[/i]!

Dictionaries have a lot of words, too, so sometimes it may be that the [i]n[/i][sup]th[/sup] word has [i]n[/i] larger than the word. This is where the algorithm really shines. We apply the compression technique [i]to this number![/i] Now all we need to store is a reference that we've applied this step twice. I call it the ``WHY Combinator'' symbol, or ``W'' (note it is unused in our base 32 conversion).

It's a shame I am only just learning about data compression now, it's so hard to prove these things out that I'll probably never get around to making it work. I just program as a hobby.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List