/prog/ - New and revolutionary data comression scheme!

Name: Anonymous 2009-06-13 17:17

Infinite compression?

I've always was interested in how compressed files worked and why the compression factor is so low.
The entropy explanation isn't something i would accept without tinkering with the data. The idea of my compression algorithm(currently under development) is to
use algebraic properties of files to express the data in more concise way.
The thing might sound trivial, but its implementation is not.
Normal compression works by splitting FILE and finding redundant pieces to express them in minimum volume.
Imagine a FILE read and converted to arbitrary precision integer. Now consider all the myriad ways to generate said integer. Sounds difficult? Not so much.
1.lets take a Prime Number,e.g. 2 and raise it to power closest to our integer, e.g. 20000. Note the difference from the (FILE-result).
2.Get the smallest difference with the powers available,
and proceed to next step:
3.If the difference is negative: find 2 raised to the power
of X which would close the gap,by substracting it from #1
If the difference is positive just add 2 with closest power to the difference .
The end result could be something like
2^6+2^5-2^3+2^1-2^0=Integer=FILE
Its can be improved further by using other prime numbers powers with the idea of expression taking the least space.
The same thing can be accomplished with arbitrary length floating point numbers like 2^123712.1282 which would converge faster,but will require some fixing to convert to a stable binary result.
Posted by FrozenVoid at 15:37

Name: Anonymous 2009-06-15 15:15

>>119
Sorry, I can't read it. The BB_CODE ^FAIL makes it too unreadable for me.

Name: Anonymous 2009-06-15 15:19

>>119
good luck finding sdjjjkdhkajhxuycuqm in a dictionary

Name: Anonymous 2009-06-15 15:28

>>122
Well we can make our own dictionaries, too, besides normal language dictionaries. Also, you could split the file up at different places, you don't have to have a long string of random characters, you can have a short sequence of them, say, three letter sequences not in the dictionaries get their own dictionary.

Name: Anonymous 2009-06-15 15:49

Infinite compression is possible, it just requires infinite computing time.

Name: Anonymous 2009-06-15 15:54

>>124
Infinite compression is possible in O(1) time. It's just the decompression that's slightly harder.

Name: Anonymous 2009-06-15 16:04

Infinite compression may be analogous to deletion, but provided the original (uncompressed) data was procedurally generated then decompression is a simple matter of regeneration; the only drawback being the requirement for decompression algorithms specific to the data being restored.

Name: Anonymous 2009-06-15 16:13

>>126
Depends on the data. In the worst case, data compressed with

compress :: [a] -> [a]

compress _ = []

can be decompressed by trying all the possible combinations of bits, but then the original file is required to check if a combination is correct.

Name: Anonymous 2009-06-15 16:19

Ok, I've got an idea for a compression algorithm:

1. Take the md5 of the file
2. Using a web crawler, search the web for a file with the same md5
3. Save that file's URL

To decompress:
1. Download the file

Name: Anonymous 2009-06-15 17:12

>>128
Maybe there could be a big, MD5-indexed file storage site. You can submit any file, and then retrieve it again by its hash sum.

Name: Anonymous 2009-06-15 17:57

>>129
You can submit any file, and then retrieve it again by its [MD5 hash].

128 bits per file? Bloated storage. We still need to compress these values somehow. Maybe store the CRC of the MD5 hash instead...

Name: Anonymous 2009-06-15 18:00

>>129
PIGEON HOLES DICKHEAD

Name: Anonymous 2009-06-15 18:03

>>128
Jeff Atwood quality!

Name: Anonymous 2009-06-15 18:50

>>131
LOL, if a collision happens, then whatever more recently uploaded file with that hash will be retrieved. :) If you use a better hash function, the chances of finding a collision are practically zero.

Name: Anonymous 2009-06-15 18:53

>>133
My hash function is INPUT XOR 0x55 -> OUTPUT and I have not yet found a collision.

Name: Anonymous 2009-06-15 18:57

Excellent trolling of an unworkable idea. Bravo, FV!

Name: Anonymous 2009-06-15 22:47

>>134
LOL, that's not a hash function, since the output isn't fixed-width.

Name: FrozenVoid 2009-06-16 0:56

hey guys, I've given it up. but I'm working on a new compression algorithm based on anus haxing.
this also reminds me that my anus is itching very badly.

__________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: Anonymous 2009-06-16 1:01

>>136
Maybe he meant he xor'd the first byte with 0x55, then xor'd that with the next, then that with the next, like foldr xor 0x55 input. I think that's a better idea anyway.

Name: Anonymous 2009-06-16 1:27

>>138
There ya go, man! Keep as cool as ya can. Face piles of trials with smiles. It riles them to believe that you perceive the web they weave. And keep on thinking free...

Name: Anonymous 2009-06-16 1:29

>>138
In that caes, files with the same bytes in different orders would have the same "hash sum."

Name: Anonymous 2009-06-16 3:24

>>140
Don't worry, no one will ever find a collision because no one is stupid enough to use it.

Name: Patrick 2009-06-16 3:33

>>141
No one will ever find me!

Name: Anonymous 2009-06-16 3:35

_{_{I just lost the ginger. :(}}

Name: FrozenVoid 2009-06-19 4:08

Bad news: After struggling with C and arbitrary precision i gave up on it. C isn't fit for this.
Unless i magically learn something like haskell(or anything that has arbitrary floats and reasonably fast) ,
its not going to work.

________________________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: FrozenVoid 2009-06-19 4:11

Hmm,If /prog/ can help with writing it in Haskell, i'll do it.
I have ghc installed but no clue how to write anything.

_______________________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: Anonymous 2009-06-19 4:13

>>144
arbitrary floats
I lol'd. A number of times.

Name: FrozenVoid 2009-06-19 4:17

>>146
Its short for "Arbitrary precision floating point numbers" aka xxx.yyy where xxx and yyy have arbitrary length.

________________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: Anonymous 2009-06-19 4:18

>>144
its not going to work.
You finally realized that, it's about time.

Name: FrozenVoid 2009-06-19 4:22

>>148
Its going to work, but it requires programming skills which i don't have yet.
But you can help...

_____________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: Anonymous 2009-06-19 4:48

hey hey, i've got a great compression idea.
like, what you do is you like hash your file, right? but you don't hash it, you use a reversible hash algorithms.
so you hash it, and then you unhash it on the other side.
considering that hashes are usually considerably smaller than their input i think this will change the world as we know it.

Name: FrozenVoid 2009-06-19 4:51

I've updated my post @ http://frozenvoid.blogspot.com/2009/04/polynomial-compression.html
with current pseudocode. So if anyone wants to help with Haskell/C/C++ implementation just comment.

__________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: Anonymous 2009-06-19 5:03

>>150
Probably FV posting as Anonymous, but here it goes:

Hash: A mathematical one-way, irreversable algorithm generating a string with fixed-length from another string of any length. ...

A hash is lossy by definition, it takes a (possibly large) data set and applies some lossy/irreversable algorithm to it which converts it into a fixed sized value. That's the definition of a hash. If you could make them reversable, it wouldn't be a hash, and most likely you would need a size as big as the input, thus making it useless for compression purposes.

Real lossless compression algorithms can't compress(by much, or it could even need more space than the input) already compressed/random data any more.

Lossy compression algorithms for media(video/audio) work by discarding data which might not be noticeable by everyone's sensory organs.

IHBT

Name: FrozenVoid 2009-06-19 5:08

>>152, >>150 is by Anonymous.
I know what hashing does and the fact of hash collisions can disprove any such claims.
Hashing erases most of data it operates on, leaving pseudo-random result as if it was seed for RNG.
_______________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: Anonymous 2009-06-19 5:17

>>153
I know what hashing does and the fact of hash collisions can disprove any such claims.
The fact of hash collisions can disprove any claims that you know what hashing does? That's unusually reasonable of you.

Name: FrozenVoid 2009-06-19 5:22

>>154
Simply by having a single hash collision, it means the file "decompressed" from hash would have to be 2 or more files
such collision is indicator the hash isn't unique for the file.

________________________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: Anonymous 2009-06-19 5:22

>>152
You wrote that yourself and it shows. The opening statement is wholly incorrect. While the problem of determining what input generated a hash in undecidable, it is important to note that this has no implications in practice. It is possible to create a list of all inputs that hash to a given output, and more importantly to create such a list ordered from smallest input to largest. Thus, someone with enough computing power and knowledge of the filesize/memory limits on the person who done the hashing can, in finite time, generate a list of all inputs that generate such a hash. They can then again- in finite time (this is where the undecidability comes in) interpret which input is most likely to have come from a genuine human source.

Name: FrozenVoid 2009-06-19 5:27

>>156
All these masturbatory fantasies about "infinite computation time" have no place in reality.
The number of files you can construct from a single hash is practically infinite.
Even if we limit hash decoding to some subset(e.g. filesize=x) then it becomes unfeasible after about several kbytes, due hashing taking time longer then expected lifetime of the universe).

____________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: FrozenVoid 2009-06-19 5:36

e.g. filesize=3000bytes,numbers= (2^8)^3000 =2^24000, at speed of 2^100 hashes per second
this would take 2^24000/(2^100 *3600*24*365) years, which is huge compared to 14billion years universe existed at all.

______________________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: Anonymous 2009-06-19 6:19

>>157
Please don't talk about things you don't understand. Thanks, hopefully you wont scare away any real responses.

Name: FrozenVoid 2009-06-19 6:20

>>159
What are they scared of? Its obvious brute-force hashing is infeasible.
____________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

New and revolutionary data comression scheme!

1 Name: Anonymous 2009-06-13 17:17

121 Name: Anonymous 2009-06-15 15:15

122 Name: Anonymous 2009-06-15 15:19

123 Name: Anonymous 2009-06-15 15:28

124 Name: Anonymous 2009-06-15 15:49

125 Name: Anonymous 2009-06-15 15:54

126 Name: Anonymous 2009-06-15 16:04

127 Name: Anonymous 2009-06-15 16:13

128 Name: Anonymous 2009-06-15 16:19

129 Name: Anonymous 2009-06-15 17:12

130 Name: Anonymous 2009-06-15 17:57

131 Name: Anonymous 2009-06-15 18:00

132 Name: Anonymous 2009-06-15 18:03

133 Name: Anonymous 2009-06-15 18:50

134 Name: Anonymous 2009-06-15 18:53

135 Name: Anonymous 2009-06-15 18:57

136 Name: Anonymous 2009-06-15 22:47

137 Name: FrozenVoid 2009-06-16 0:56

138 Name: Anonymous 2009-06-16 1:01

139 Name: Anonymous 2009-06-16 1:27

140 Name: Anonymous 2009-06-16 1:29

141 Name: Anonymous 2009-06-16 3:24

142 Name: Patrick 2009-06-16 3:33

143 Name: Anonymous 2009-06-16 3:35

144 Name: FrozenVoid 2009-06-19 4:08

145 Name: FrozenVoid 2009-06-19 4:11

146 Name: Anonymous 2009-06-19 4:13

147 Name: FrozenVoid 2009-06-19 4:17

148 Name: Anonymous 2009-06-19 4:18

149 Name: FrozenVoid 2009-06-19 4:22

150 Name: Anonymous 2009-06-19 4:48

151 Name: FrozenVoid 2009-06-19 4:51

152 Name: Anonymous 2009-06-19 5:03

153 Name: FrozenVoid 2009-06-19 5:08

154 Name: Anonymous 2009-06-19 5:17

155 Name: FrozenVoid 2009-06-19 5:22

156 Name: Anonymous 2009-06-19 5:22

157 Name: FrozenVoid 2009-06-19 5:27

158 Name: FrozenVoid 2009-06-19 5:36

159 Name: Anonymous 2009-06-19 6:19

160 Name: FrozenVoid 2009-06-19 6:20