Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

New and revolutionary data comression scheme!

Name: Anonymous 2009-06-13 17:17

Infinite compression?

I've always was interested in how compressed files worked and why the compression factor is so low.
The entropy explanation isn't something i would accept without tinkering with the data. The idea of my compression algorithm(currently under development) is to
use algebraic properties of files to express the data in more concise way.
The thing might sound trivial, but its implementation is not.
Normal compression works by splitting FILE and finding redundant pieces to express them in minimum volume.
Imagine a FILE read and converted to arbitrary precision integer. Now consider all the myriad ways to generate said integer. Sounds difficult? Not so much.
1.lets take a Prime Number,e.g. 2 and raise it to power closest to our integer, e.g. 20000. Note the difference from the (FILE-result).
2.Get the smallest difference with the powers available,
and proceed to next step:
3.If the difference is negative: find 2 raised to the power
of X which would close the gap,by substracting it from #1
If the difference is positive just add 2 with closest power to the difference .
The end result could be something like
2^6+2^5-2^3+2^1-2^0=Integer=FILE
Its can be improved further by using other prime numbers powers with the idea of expression taking the least space.
The same thing can be accomplished with arbitrary length floating point numbers like 2^123712.1282 which would converge faster,but will require some fixing to convert to a stable binary result.
Posted by FrozenVoid at 15:37

Name: Anonymous 2009-06-15 3:02

What if the integer representation of the data happens to be the product of two prime numbers, I think you're going to be waiting a really long time to produce any usable results.
Summary of the problems I found:
* No way to represent leading zeros
* Not uniquely-decodable unless some additional information is added (number of terms, length and position of each term - ie. you're always bounded by the entropy of the source)
* The factorization problem places bounds upon how quick this thing can work. This is virtually unusable for anything such as stream coding (DVD-ROM, GSM, WiFi, etc...), as well as any practicle sized data sets.

Name: Anonymous 2009-06-15 3:04

* Non-determinisitic - if you're data is the product of two large primes, you're not going to get an encoding within reasonable time.

Name: FrozenVoid 2009-06-15 3:11

>>81
Suppose you had binary data: 1111
which is equal to 15 = 5*3

You would send an ASCII string "5*3" as the compressed data, so the person on the other end code read "5*3" and decode it into 15 = 1111 - the original data sent
____________________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: Anonymous 2009-06-15 3:16

>>83
But the ASCII string "5*3" is actually 24 bits, which is 6x larger than the original thing you started with.

Name: Anonymous 2009-06-15 3:18

>>83
binary 1111 = 4 bits
ASCII "5*3" = 24 bits
that's not very good compression.

Name: FrozenVoid 2009-06-15 3:18

>>84
We don't need to use ASCII, we can just use binary to represent the terms. 5 can be represented as 101, and 3 can be represented as 11, put that together, we have 10111 as our compressed text.
____________________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: Anonymous 2009-06-15 3:25

Let me show you another example, to hopefully make it more clear:
Suppose your binary data was 65 = 1000001
This is 2^6+1, so we represent this as (curly braces added to make it more clear):
{10}{110}{1}
 2    6   1

which is 1 bit less than the original.
____________________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: Anonymous 2009-06-15 3:27

>>86
That's still one bit more than the original, plus how do you know that it isn't 2*7?

Name: Anonymous 2009-06-15 3:27

>>87
Don't forget markers to indicate how many bits each number is.

Name: Anonymous 2009-06-15 3:34

>>87
But again, there's no way to represent "101101" and have the decoder be able to know you mean 10^110+1 and not 101^10+1, unless you add additional bits in the compressed text to denote term length and position. You also need to specify which operations are being performed between each term.
So you need to specify:
There are three terms
First term is two bits, second term is three bits (size of last term can be determined from first N-1 terms).
There is exponentiation between the first two terms,
Addition between the last two terms.
So your compression actually looks like:
11 {3 terms}
01 {1st term = 2bits}
11 {2nd term = 2bits}
00 {exponentiation}
01 {addition)
Final: 1101110001101101
And that's leaving out the problem about the representation of the numbers used in the first part, though you could use fixed-size integers for all numbers in the header.

Name: FrozenVoid 2009-06-15 3:40

Another example
BData = 100000000000000000000000000000000000000000000000001 (51 bits)
Arithmetic is 2^50+1
10{EXP}110010{PLUS}1 is the encoding, which is considerably less than the original data, even with the header.
____________________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: Anonymous 2009-06-15 3:43

>>91
try this one: 101010101000101010011100111100000001010101000010111010

Name: Anonymous 2009-06-15 3:49

>>91
Do a full encoding, include how you know where the digit ends and where the EXP/PLUS part is.

Name: Anonymous 2009-06-15 4:01

>>93
recv: 101100101
we know from >>91 that the exp is after the second bit and the plus is after the eighth bit, so we go:
10{EXP} - read 2
110010(PLUS) - read 50, calculate 2^50, store result
1[eof] - read 1, calculate +1 to previous result
final result = 100000000000000000000000000000000000000000000000001

Name: Anonymous 2009-06-15 4:29

Stop replying to his posts, you gonna get trolled exponentially.

Name: Anonymous 2009-06-15 4:30

>>94
And if didn't have >>91, as in an actual usage of this algorithm; explain would you know 101100101 is 2^50-1 instead of 2^12-5.

Name: FrozenVoid 2009-06-15 4:34

>>96
I think you check your assumptions, the current reality of the situation is that we do indeed know >>91, and so we can easily see that the expression is 10^110010+1
DUCY (do you see why)?
____________________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: Anonymous 2009-06-15 4:46

>>97
Tell me what 101101101 decodes to:
a) 2^54-1
b> 2^13-5
c) 2^6-13
d) 22^6-1

Name: Anonymous 2009-06-15 4:52

I'm somewhat depressed that this is easily the most active thread on /prog/. It's the same faggot trolling himself over and over, or people are actually so starved for actual programming content that they're stooping to replying to FrozenVoid threads, or the final surprise option: Nobody that remembers him is still here.

I shudder to think about any of these options.

Name: FrozenVoid 2009-06-15 4:57

>>98
Trick question, you didn't post where exponentiation is. Also why do you keep using -? It's addition we're doing at the end.

>>99
To quote a friend of mine:
``We're all living in the gutters, though some of us are looking to the stars"
____________________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: Anonymous 2009-06-15 5:02

>>100
I didn't post where the exponentiation is because you need to encode that information somehow, DUCY (do you see why)?

Name: FrozenVoid 2009-06-15 5:04

>>101
Obvious troll is obvious.
Stealing my acronyms - check.
Saging my thread - check.
Counter-productive smart alec response - check and check.
Bumping for the night.
____________________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: Anonymous 2009-06-15 5:05

>>99
Nobody that remembers  is still here.
I doubt it.  /prog/ regulars seem to have no turnover whatsoever.

Name: Anonymous 2009-06-15 5:07

>>99
people are actually so starved for actual programming content that they're stooping to replying to a fakeFrozenVoid
ftfy

Name: Anonymous 2009-06-15 5:12

>>102
Call me a troll because you have no other valid response - check

  ∧_∧   ミ   ⌒
 (´∀`/⌒◯  (    )
  | 八  r 丿<  ))   ) This thread stinks!
 (_)(_)__)  (   ) )

Name: Anonymous 2009-06-15 5:19

>>102
The sage feature was never meant to serve as an implied insult or general disagreement! Why people started using it that way is beyond me.

Name: Anonymous 2009-06-15 5:45

>>104
Posting reddit acronyms
Eye Hibbitte

Name: Anonymous 2009-06-15 5:54

>>94
we know from >>91
Are you claiming that your decompression program will know to fetch http://dis.4chan.org/read/prog/1244927864/91 in order to work?

Name: @FULLFORCE 2009-06-15 7:30

HEY GUISE, WHAT'S GOING ON?

Name: Anonymous 2009-06-15 9:49

>>99
I'm somewhat depressed that this is easily the most active thread on /prog/.
You have no one to blame but you'reself.

Name: Anonymous 2009-06-15 10:34

What does this evaluate to? 1010101010

Name: Anonymous 2009-06-15 12:24

The only way I see this compression scheme working and being feasible in polynomial time is when the data is sufficiently large and sparse.
The header could consist of the original file length (and perhaps the number used for decompression, but let's assume that this is assumed to be 2).
Then the compressed file would look like {header}{pos}{pos}{pos}.... The length (in bits) of a single {pos} would be ceiling $ logBase 2 originalLen.
Essentially, this scheme would specify where the 1 bits are, so a file 000000100000001000010000000000000000000000000000001 would be compressed as header + 4 positions.
The scheme could also specify whether the positions specify 0s or 1s. Using a number p other than 2 is possible too, but in that case the packet length will increase to represent the state of the digit (ceiling $ logBase 2 $ originalLen * (p - 1)).
Additionally, if p was, say, 8, but the only digits appearing (aside from 0, being the base digit) were, say, [1, 2, 5, 7] (notice length digits == 4), the number of states would be 4, so the length of one {pos} would increase by 2 bits, not 3.
That's just expanding on >>91, though.

Name: Anonymous 2009-06-15 12:51

The only way I see this compression scheme working
stopped reading right there

Name: Anonymous 2009-06-15 12:52

>>113
A shame, you could make fun of me more after you read my post.

Name: FrozenVoid 2009-06-15 13:24

>>76 You can't steal anything. The algorithm(s) isn't complete yet.
>>102,100,97,91,86,83,78 Nice try.

________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: Anonymous 2009-06-15 13:49

>>78
LOL WUT? A property of random data is that it can't be compressed. WHBCT.

Name: Anonymous 2009-06-15 14:08

>>114
Sorry, standard trolling reply template. I did read the whole thing, of course.

Name: Anonymous 2009-06-15 14:36

compress my anus

Name: Anonymous 2009-06-15 15:05

>>112
Honestly, the problem is using base 2. It's already taking up too much space. Look, the number 8 is one digit but you need 4 bits to store it! (#b1000 in some Scheme dialects).

The first thing you want to do is convert everything to base 32, that is,
[code]0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V[code]
where `Z' represents 35 * 36[sup]0[/sup]. So let's take a number like
10001. This is just `H'. Now things get tricky.
Take a number like 010000111001110010111101111110011100. This parses directly as "GRUNNUR" (except for the leading zero, which is troublesome but I'm working on that), which happens to be an Icelandic word for "foundation." Therefore all we need to do is store a small numeric reference to a set of dictionaries (i.e., store the number [i]n[/i] for the [i]n[/i][sup]th[/sup] dictionary), and then the number of the word in that dictionary. Say, "GRUNNER" is the 50th word. The dictionaries would be open source, of course, so everyone could reference them.

I know what you're thinking, though: we'd need so many dictionaries! That's true, so we'll just reference them by the nearest prime strictly less than it and an offset, and compress even the [i]reference[/i]!

Dictionaries have a lot of words, too, so sometimes it may be that the [i]n[/i][sup]th[/sup] word has [i]n[/i] larger than the word. This is where the algorithm really shines. We apply the compression technique [i]to this number![/i] Now all we need to store is a reference that we've applied this step twice. I call it the ``WHY Combinator'' symbol, or ``W'' (note it is unused in our base 32 conversion).

It's a shame I am only just learning about data compression now, it's so hard to prove these things out that I'll probably never get around to making it work. I just program as a hobby.

Name: Anonymous 2009-06-15 15:12

>>117
Ah, cool ;-)

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List