Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

New and revolutionary data comression scheme!

Name: Anonymous 2009-06-13 17:17

Infinite compression?

I've always was interested in how compressed files worked and why the compression factor is so low.
The entropy explanation isn't something i would accept without tinkering with the data. The idea of my compression algorithm(currently under development) is to
use algebraic properties of files to express the data in more concise way.
The thing might sound trivial, but its implementation is not.
Normal compression works by splitting FILE and finding redundant pieces to express them in minimum volume.
Imagine a FILE read and converted to arbitrary precision integer. Now consider all the myriad ways to generate said integer. Sounds difficult? Not so much.
1.lets take a Prime Number,e.g. 2 and raise it to power closest to our integer, e.g. 20000. Note the difference from the (FILE-result).
2.Get the smallest difference with the powers available,
and proceed to next step:
3.If the difference is negative: find 2 raised to the power
of X which would close the gap,by substracting it from #1
If the difference is positive just add 2 with closest power to the difference .
The end result could be something like
2^6+2^5-2^3+2^1-2^0=Integer=FILE
Its can be improved further by using other prime numbers powers with the idea of expression taking the least space.
The same thing can be accomplished with arbitrary length floating point numbers like 2^123712.1282 which would converge faster,but will require some fixing to convert to a stable binary result.
Posted by FrozenVoid at 15:37

Name: Anonymous 2009-06-15 2:56

>>78
So you want to find a prime factorization of the integer representation of some arbitrary data? The problem is how are you actually going to encode the symbols used? I assume you want the compressed stream to be uniquely decodable.
If you're going to use ASCII, you throw out any optimization you gain except in the really tiny cases (<512b), but that's really the only way this could work, since the numbers themselves can be of arbitrary length.
How are you going to encode anything greater than, say, 1MiB, within a reasonable time frame? How do you deal with the leading zero case?

Name: Anonymous 2009-06-15 3:02

What if the integer representation of the data happens to be the product of two prime numbers, I think you're going to be waiting a really long time to produce any usable results.
Summary of the problems I found:
* No way to represent leading zeros
* Not uniquely-decodable unless some additional information is added (number of terms, length and position of each term - ie. you're always bounded by the entropy of the source)
* The factorization problem places bounds upon how quick this thing can work. This is virtually unusable for anything such as stream coding (DVD-ROM, GSM, WiFi, etc...), as well as any practicle sized data sets.

Name: Anonymous 2009-06-15 3:04

* Non-determinisitic - if you're data is the product of two large primes, you're not going to get an encoding within reasonable time.

Name: FrozenVoid 2009-06-15 3:11

>>81
Suppose you had binary data: 1111
which is equal to 15 = 5*3

You would send an ASCII string "5*3" as the compressed data, so the person on the other end code read "5*3" and decode it into 15 = 1111 - the original data sent
____________________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: Anonymous 2009-06-15 3:16

>>83
But the ASCII string "5*3" is actually 24 bits, which is 6x larger than the original thing you started with.

Name: Anonymous 2009-06-15 3:18

>>83
binary 1111 = 4 bits
ASCII "5*3" = 24 bits
that's not very good compression.

Name: FrozenVoid 2009-06-15 3:18

>>84
We don't need to use ASCII, we can just use binary to represent the terms. 5 can be represented as 101, and 3 can be represented as 11, put that together, we have 10111 as our compressed text.
____________________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: Anonymous 2009-06-15 3:25

Let me show you another example, to hopefully make it more clear:
Suppose your binary data was 65 = 1000001
This is 2^6+1, so we represent this as (curly braces added to make it more clear):
{10}{110}{1}
 2    6   1

which is 1 bit less than the original.
____________________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: Anonymous 2009-06-15 3:27

>>86
That's still one bit more than the original, plus how do you know that it isn't 2*7?

Name: Anonymous 2009-06-15 3:27

>>87
Don't forget markers to indicate how many bits each number is.

Name: Anonymous 2009-06-15 3:34

>>87
But again, there's no way to represent "101101" and have the decoder be able to know you mean 10^110+1 and not 101^10+1, unless you add additional bits in the compressed text to denote term length and position. You also need to specify which operations are being performed between each term.
So you need to specify:
There are three terms
First term is two bits, second term is three bits (size of last term can be determined from first N-1 terms).
There is exponentiation between the first two terms,
Addition between the last two terms.
So your compression actually looks like:
11 {3 terms}
01 {1st term = 2bits}
11 {2nd term = 2bits}
00 {exponentiation}
01 {addition)
Final: 1101110001101101
And that's leaving out the problem about the representation of the numbers used in the first part, though you could use fixed-size integers for all numbers in the header.

Name: FrozenVoid 2009-06-15 3:40

Another example
BData = 100000000000000000000000000000000000000000000000001 (51 bits)
Arithmetic is 2^50+1
10{EXP}110010{PLUS}1 is the encoding, which is considerably less than the original data, even with the header.
____________________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: Anonymous 2009-06-15 3:43

>>91
try this one: 101010101000101010011100111100000001010101000010111010

Name: Anonymous 2009-06-15 3:49

>>91
Do a full encoding, include how you know where the digit ends and where the EXP/PLUS part is.

Name: Anonymous 2009-06-15 4:01

>>93
recv: 101100101
we know from >>91 that the exp is after the second bit and the plus is after the eighth bit, so we go:
10{EXP} - read 2
110010(PLUS) - read 50, calculate 2^50, store result
1[eof] - read 1, calculate +1 to previous result
final result = 100000000000000000000000000000000000000000000000001

Name: Anonymous 2009-06-15 4:29

Stop replying to his posts, you gonna get trolled exponentially.

Name: Anonymous 2009-06-15 4:30

>>94
And if didn't have >>91, as in an actual usage of this algorithm; explain would you know 101100101 is 2^50-1 instead of 2^12-5.

Name: FrozenVoid 2009-06-15 4:34

>>96
I think you check your assumptions, the current reality of the situation is that we do indeed know >>91, and so we can easily see that the expression is 10^110010+1
DUCY (do you see why)?
____________________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: Anonymous 2009-06-15 4:46

>>97
Tell me what 101101101 decodes to:
a) 2^54-1
b> 2^13-5
c) 2^6-13
d) 22^6-1

Name: Anonymous 2009-06-15 4:52

I'm somewhat depressed that this is easily the most active thread on /prog/. It's the same faggot trolling himself over and over, or people are actually so starved for actual programming content that they're stooping to replying to FrozenVoid threads, or the final surprise option: Nobody that remembers him is still here.

I shudder to think about any of these options.

Name: FrozenVoid 2009-06-15 4:57

>>98
Trick question, you didn't post where exponentiation is. Also why do you keep using -? It's addition we're doing at the end.

>>99
To quote a friend of mine:
``We're all living in the gutters, though some of us are looking to the stars"
____________________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List