I've always was interested in how compressed files worked and why the compression factor is so low.
The entropy explanation isn't something i would accept without tinkering with the data. The idea of my compression algorithm(currently under development) is to
use algebraic properties of files to express the data in more concise way.
The thing might sound trivial, but its implementation is not.
Normal compression works by splitting FILE and finding redundant pieces to express them in minimum volume.
Imagine a FILE read and converted to arbitrary precision integer. Now consider all the myriad ways to generate said integer. Sounds difficult? Not so much.
1.lets take a Prime Number,e.g. 2 and raise it to power closest to our integer, e.g. 20000. Note the difference from the (FILE-result).
2.Get the smallest difference with the powers available,
and proceed to next step:
3.If the difference is negative: find 2 raised to the power
of X which would close the gap,by substracting it from #1
If the difference is positive just add 2 with closest power to the difference .
The end result could be something like
2^6+2^5-2^3+2^1-2^0=Integer=FILE
Its can be improved further by using other prime numbers powers with the idea of expression taking the least space.
The same thing can be accomplished with arbitrary length floating point numbers like 2^123712.1282 which would converge faster,but will require some fixing to convert to a stable binary result.
Posted by FrozenVoid at 15:37
Name:
Anonymous2009-06-13 17:21
I forgot to mention that you can use this algorithm recursively (LISP-style) to compress any amount of data to a single number.
>>8 >>9 >>10
To the best of my memory, before I came to the saving grace of Jesus Christ, I did not believe the Bible was true. I doubted whether God, Satan, heaven, or hell even existed. I believed that we were born, lived so many years, and then died. I had my own business and thought that I had succeeded by my own wits.
One evening, my wife and I heard some documentation that these were the last days before Jesus Christ would actually return. Not wanting to hear it, I almost walked out. Something kept me there, and I listened but was not convinced; however, I decided to do some research to find out if the Bible was really true. Indeed, if I could find one contradiction or anything that was not true, then I could disregard it. I believed this would not take long. This led me into much research. I learned nearly one-third of the Bible is, directly or indirectly, related to prophecy, which includes about 10,000 prophecies. One thing needed was to determine when the Bible was actually written. Thus, a study of biblical history, various translations, and archaeology was necessary. The Dead Sea Scrolls, which were found in Israel, contained parts of the Old Testament, including prophecies of the coming of Jesus. It has been proven that these were written before Christ came. Thousands of clay tablets and archaeological sites also confirm many accounts in the Bible.
I took time off and began studying the prophecies. My wife would spend much time at the library. She obtained documentation for me from reference books, which I would check against the Scriptures to see if the prophecies took place. One week went by and then a month. Every prophecy that we were able to get information on proved to be accurate. I was astonished, but still not convinced. Later, there were people who would show me what appeared to be contradictions in the Bible. These were not contradictions, but only a lack of research on the part of those that said these things. Stubborn, that's me. Even after four months of intensive study, proving prophecy after prophecy was true, I was still skeptical. Four months turned into six. I became more determined. It wasn't possible that the sixty-six books of the Bible, written by many people over hundreds of years, would not have some errors, I thought. Thousands of prophecies and every one perfect? No, impossible! If I would admit that, then I would also have to admit there was a God. I was not prepared to do that—yet, I wanted to know the truth. More months passed. Finally, I had to admit after spending almost countless hours of research—I was wrong. I may have been the biggest skeptic in the world, but now I know—the Bible is true and is the perfect Word of God. Anyone willing to take the time I did and do the same research could only come to the same conclusion, if they are honest with themselves. I became afraid that I would perish. I surrendered my life to Jesus Christ, the only begotten Son of God, as a result of His love, compassion, mercy and grace.
I know that there is none other name under heaven given among men whereby we MUST be saved (EXCEPT JESUS)-ref Acts 4:12. I REPENTED of my sins and received Jesus Christ as my only hope of salvation by FAITH-ref Eph 2:8-10. It is written, EXCEPT YE BE CONVERTED, AND BECOME AS LITTLE CHILDREN, YE SHALL NOT ENTER THE KINGDOM OF HEAVEN-Mt 18:3. You can also call on Jesus NOW to be YOUR Lord and Savior.
>>11
"Signs" and "evidence" ever so rarely lead to true conversion. Even when they do, they are easily shaken when confronted with conflicting evidence. It's a big reason traditional "Christians" have become so good at discouraging critical thought, or repeating ridiculous arguments.
It's people like this that are the cancer that is killing religion.
Name:
Anonymous2009-06-13 22:21
Is it anything like the algorithmic language scheme?
Name:
Anonymous2009-06-14 3:15
So, you're basically factoring the file as an integer, and then storing only the factors? That won't work, since storing the factors in the smallest form (binary!) will take just as much space as it would to just store the original file! You're "compression" scheme is actually just a complicated and processor-intensive way to re-organize the bits of the file.
>>15
Not really, because all the data is still there, except that "15" is represented as "5*3". So long as you know what operations to perform, you can compute the simplest form.
Name:
FrozenVoid2009-06-14 4:24
I'm working on it, the scheme described in blog isn't infinite per se(the potential compression ratio is).
It just creates and equation(not necessary involving the powers of 2) which gives an answer, which then converts to integer,
which is saved as binary files. Since i don't know C/C++ well this implementation
would take quite a bit of time.
The key is finding numbers algorithmically which fit the integer equation:
e.g. 7^2472471 would convert to some part of equation, which than combines with something like
(int)81.1293^38381.1177 and produces result closer to file contents.
>>14
the equation isn't "factoring" the integer.
Its finding the least amount of keys and performing math operations to gain result.
e.g (x^y+(j^b*k^p)-m^z/b^a).toInteger()
>>19
It's too bad your tripcode is banned, cause now I'll never know if it's really you or just a great troll.
Name:
FrozenVoid2009-06-14 4:43
>>20
I can use some other tripcodes, but mods would ban them as well.
Plus, tripcodes on this board gain unhealthy attention from the masses.
P.S. You would gain nothing from impersonating me, except ridicule.
_________________________________ http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est
Name:
FrozenVoid2009-06-14 4:50
The thing is, programming it in C is so frustrating, i'll probably wouldn't finish it.
(JavaScript is too slow(esp. anything over 100kb) and its interface with filesystem is horrible,though it has arbitrary precision libraries)
>>23 I know what you trolling about, you just misunderstand me.
I know the limits of JavaScript. Its still my favorite language, make no mistake.
Its,however not fit for such data-intensive operations, unlike C, and its file handling capability is limited by browser.
intToBs :: Integer -> BS.ByteString
intToBs i = BS.pack $ (reverse . word8s) i
where word8s 0 = []
word8s n = fromIntegral (n .&. 255) : word8s (div n 256)
fvFactor :: Integer -> Integer -> [Integer]
fvFactor 0 _ = []
fvFactor n e
| 2^e > n = (e - 1) : fvFactor (n - 2^(e - 1)) 0
| otherwise = fvFactor n (e + 1)
main = do
conts <- BS.getContents
putStrLn $ show $ bsToInt conts
This program will convert its input into a large integer. fvFactor will factor n from e with powers of two, which basically just tells you how many 1 bits there are in the integer.
For example, let's take this file "This is just a test.\n". It's integer is 123362224183149454760125818890661635634932860857866.
>>30
Haskell is 10 times harder than C(and completely unintuitive for me). I can't use it.
Your program isn't doing any work: its just splitting it into sum of powers of 2.
The file taken is too small btw. The operation is intended to be used on larger(e.g. 1mb) files.
try finding instead the float power of 2 which closest to your integer.
e.g. 2^xxx.yyy which is closest to your integer.
>>30
can you make a haskell program which will find the float power for 100 first primes and find which one is the closest
to integer. e.g. (int)[primeX]^[double float powY]?
>>38
its not related. I can say take first 721321 bytes of equation X result and save as part 1, combine with take first 23478234 bytes of equation Y result and save as part 2.
>>42 i believe in pigeonhole principle as much as i believe in 0.999..=1
in this case you claim something like equation thats less then 1000 bytes cannot express ALL files.
Thats not my goal: i just need to expres the range from 1 to 4GB files.
My recommendation to the OP would be to post your ideas on comp.theory or go see somebody in academia, and see what they think. I think your crackpot ideas would make for some nice entertainment for everyone :-)
Name:
FrozenVoid2009-06-14 6:35
>>47 I don't use newsgroups, and my blog is much better place for it.
The claim that idea is 'crackpot' is like telling me that my equation cannot result in some integer(which would easily fixed by adding another equation which closes the gap with some math operation)
____________________________________________ http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est
>>47
i used to read comp.compression some years ago, and every few months there was some nut posting about infinite compression. in fact it may have been fv
Name:
FrozenVoid2009-06-14 6:52
I'm off to watch some anime so here the draft pseudocode for compression:
1.build array of 100(or 10000 whatever) primes from sequence of primes
2.get a file e.g. http://marknelson.us/attachments/million-digit-challenge/AMillionRandomDigits.bin
(the above is statistically random binary noise)
3.convert the file into INTEGER
4. for(i=0;i<ArrayOfPrimes;i++){
ArrayOfPrimes[i]^DoubleFloatPowerY =Z->convert to Integer->compare to 3.
//find the prime number & power which is closest,(in terms of absolute value,not larger or smaller) than 3.INTEGER
}
5.calculate the difference between 4. and 3. this number is new INTEGER
6.apply process from 4. to this INTEGER, until the result is :
3.INTEGER=x^y-z^n+k^v. etc
(in other versions this can be used with multiplier/divisors picked instead of 5.)
Has anyone already explained the obvious improvement on the original schema?
1.lets take a Prime Number,e.g. 2 and raise it to power closest to our integer, e.g. 20000. Note the difference from the (FILE-result).
It's wasteful to use the _nearest_ power, as it means that we have to store the sign of the difference too. So let's use just a greatest power of two, lower than the current number.
The end result could be something like
2^6 + 2^5 + 2^3 + 2^1 + 2^0=Integer=FILE
Which means that I can use only one bit to indicate the presence or absence of each power. This can be further improved by compressing _these_ bits using the same schema, or maybe RLE or something.
>>17
Yes, but it's so retarded and nobody would consider it that it would be like hiding files in plain sight. I think it would be pretty effective.
Name:
Anonymous2009-06-14 18:17
Each byte is comprised of eight bits, right?
And usually (being bits) they can be expressed as powers of 2, right?
So a byte can easily be compressed into three bits (8 combinations) as the powers of 2 it represents.
So that means an (8/3) = 266% compression ratio.
Why has nobody thought of this before?
I actually thought this was pretty clever, its too bad you posted as frozen void though, so everyone knew it was a troll post. Otherwise you would have tricked plenty of the /prog/tards.
I actually thought this was pretty clever
No one is that stupid
Name:
Anonymous2009-06-14 18:55
2^6+2^5-2^3+2^1-2^0=Integer=FILE
FILE = 84?
Name:
Anonymous2009-06-14 19:03
Infinite compression?
I stopped reading there. And then I immediately haxed >>1's anus with exponential force.
Name:
Anonymous2009-06-14 19:13
>>51
Why use powers of primes? Why not just find the nearest prime and offset?
Name:
Anonymous2009-06-14 22:28
>>61
Yep sorry, I skimmed it, obviously that's a give away. But the method described of using an equation to model the data in a compressed form is very analogous to how a lot of compression is done, and thus worthy of a double take.
Name:
FrozenVoid2009-06-15 0:17
>>66
the principle is that any file, is also a number.
I'm working on way to express this number more concisely then binary.
there many ways to creates equation, here is the simplest type:
here is example for javascript(unfortunately i can't work out example with arbitrary precision floats)
function gethexstr(bytes,base,exp){return pow(base,exp).toString(16).substr(0,bytes)}
function pow(x,y){return Math.pow(x,y)};
gethexstr(62,29.78123,81.172);will return '2b9000ca928614000000000000000000000000000000000000000000000000'
such strings are combined and you get a file in the end.
>>68 Its more concise to use, like all of my library functions.
I can't bother to type out the full function names like with:
function tag(x,y){if(!y){return document.getElementsByTagName(x)}else{return x.getElementsByTagName(y)}};
tag('div') vs document.getElementsByTagName('div')
Have you even tried to work out an example of this by hand? I've tried, and I can't get it to work. Don't try to write in code what you can't express by hand.
Name:
Anonymous2009-06-15 2:29
I'm not really sure I follow the compression scheme. Since you're expressing the file as operations on numbers, you still need some way to represent those numbers (ie. binary), in which case you're not really gaining much efficiency. Please correct me if I'm wrong, though.
>>73
Ah, so you're trying to find an algebraic expression for an arbitrary-length number, is this it?
Name:
Anonymous2009-06-15 2:41
Given some integer, the OP wants to discover some algebraic expression which will evaluate to that integer, in finite time, such that the overall encoding of the expression is less than the original source encoding.
Name:
Anonymous2009-06-15 2:44
>>1
I've already filed a patent just a couple of days ago covering this compression scheme. Better luck next time, kid.
Name:
Anonymous2009-06-15 2:47
What if you had a file consisting of 1000 zero bytes, followed by 0x01, followed by 1000 zero bytes? How do you represent the first 1000 zeros?
Name:
FrozenVoid2009-06-15 2:49
>>77 Leading zero's cannot be properly expressed in my compression. You'll need to use any other compression scheme (huffman, lz, etc) if you want to use it in such cases. My compression works optimally on purely random data.
>>78
Random data will have at least one leading zero 50% of the time.
Name:
Anonymous2009-06-15 2:56
>>78
So you want to find a prime factorization of the integer representation of some arbitrary data? The problem is how are you actually going to encode the symbols used? I assume you want the compressed stream to be uniquely decodable.
If you're going to use ASCII, you throw out any optimization you gain except in the really tiny cases (<512b), but that's really the only way this could work, since the numbers themselves can be of arbitrary length.
How are you going to encode anything greater than, say, 1MiB, within a reasonable time frame? How do you deal with the leading zero case?
Name:
Anonymous2009-06-15 3:02
What if the integer representation of the data happens to be the product of two prime numbers, I think you're going to be waiting a really long time to produce any usable results.
Summary of the problems I found:
* No way to represent leading zeros
* Not uniquely-decodable unless some additional information is added (number of terms, length and position of each term - ie. you're always bounded by the entropy of the source)
* The factorization problem places bounds upon how quick this thing can work. This is virtually unusable for anything such as stream coding (DVD-ROM, GSM, WiFi, etc...), as well as any practicle sized data sets.
Name:
Anonymous2009-06-15 3:04
* Non-determinisitic - if you're data is the product of two large primes, you're not going to get an encoding within reasonable time.
Name:
FrozenVoid2009-06-15 3:11
>>81
Suppose you had binary data: 1111
which is equal to 15 = 5*3
You would send an ASCII string "5*3" as the compressed data, so the person on the other end code read "5*3" and decode it into 15 = 1111 - the original data sent
____________________________________________ http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est
Name:
Anonymous2009-06-15 3:16
>>83
But the ASCII string "5*3" is actually 24 bits, which is 6x larger than the original thing you started with.
Name:
Anonymous2009-06-15 3:18
>>83
binary 1111 = 4 bits
ASCII "5*3" = 24 bits
that's not very good compression.
Name:
FrozenVoid2009-06-15 3:18
>>84
We don't need to use ASCII, we can just use binary to represent the terms. 5 can be represented as 101, and 3 can be represented as 11, put that together, we have 10111 as our compressed text.
____________________________________________ http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est
Name:
Anonymous2009-06-15 3:25
Let me show you another example, to hopefully make it more clear:
Suppose your binary data was 65 = 1000001
This is 2^6+1, so we represent this as (curly braces added to make it more clear):
{10}{110}{1}
2 6 1
which is 1 bit less than the original.
____________________________________________ http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est
Name:
Anonymous2009-06-15 3:27
>>86
That's still one bit more than the original, plus how do you know that it isn't 2*7?
Name:
Anonymous2009-06-15 3:27
>>87
Don't forget markers to indicate how many bits each number is.
Name:
Anonymous2009-06-15 3:34
>>87
But again, there's no way to represent "101101" and have the decoder be able to know you mean 10^110+1 and not 101^10+1, unless you add additional bits in the compressed text to denote term length and position. You also need to specify which operations are being performed between each term.
So you need to specify:
There are three terms
First term is two bits, second term is three bits (size of last term can be determined from first N-1 terms).
There is exponentiation between the first two terms,
Addition between the last two terms.
So your compression actually looks like:
11 {3 terms}
01 {1st term = 2bits}
11 {2nd term = 2bits}
00 {exponentiation}
01 {addition)
Final: 1101110001101101
And that's leaving out the problem about the representation of the numbers used in the first part, though you could use fixed-size integers for all numbers in the header.
Name:
FrozenVoid2009-06-15 3:40
Another example
BData = 100000000000000000000000000000000000000000000000001 (51 bits)
Arithmetic is 2^50+1
10{EXP}110010{PLUS}1 is the encoding, which is considerably less than the original data, even with the header.
____________________________________________ http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est
>>91
try this one: 101010101000101010011100111100000001010101000010111010
Name:
Anonymous2009-06-15 3:49
>>91
Do a full encoding, include how you know where the digit ends and where the EXP/PLUS part is.
Name:
Anonymous2009-06-15 4:01
>>93
recv: 101100101
we know from >>91 that the exp is after the second bit and the plus is after the eighth bit, so we go:
10{EXP} - read 2
110010(PLUS) - read 50, calculate 2^50, store result
1[eof] - read 1, calculate +1 to previous result
final result = 100000000000000000000000000000000000000000000000001
Stop replying to his posts, you gonna get trolled exponentially.
Name:
Anonymous2009-06-15 4:30
>>94
And if didn't have >>91, as in an actual usage of this algorithm; explain would you know 101100101 is 2^50-1 instead of 2^12-5.
Name:
FrozenVoid2009-06-15 4:34
>>96
I think you check your assumptions, the current reality of the situation is that we do indeed know >>91, and so we can easily see that the expression is 10^110010+1
DUCY (do you see why)?
____________________________________________ http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est
Name:
Anonymous2009-06-15 4:46
>>97
Tell me what 101101101 decodes to:
a) 2^54-1
b> 2^13-5
c) 2^6-13
d) 22^6-1
I'm somewhat depressed that this is easily the most active thread on /prog/. It's the same faggot trolling himself over and over, or people are actually so starved for actual programming content that they're stooping to replying to FrozenVoid threads, or the final surprise option: Nobody that remembers him is still here.
I shudder to think about any of these options.
Name:
FrozenVoid2009-06-15 4:57
>>98
Trick question, you didn't post where exponentiation is. Also why do you keep using -? It's addition we're doing at the end.
>>99
To quote a friend of mine:
``We're all living in the gutters, though some of us are looking to the stars"
____________________________________________ http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est
>>100
I didn't post where the exponentiation is because you need to encode that information somehow, DUCY (do you see why)?
Name:
FrozenVoid2009-06-15 5:04
>>101
Obvious troll is obvious.
Stealing my acronyms - check.
Saging my thread - check.
Counter-productive smart alec response - check and check.
Bumping for the night.
____________________________________________ http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est
>>99 I'm somewhat depressed that this is easily the most active thread on /prog/.
You have no one to blame but you'reself.
Name:
Anonymous2009-06-15 10:34
What does this evaluate to? 1010101010
Name:
Anonymous2009-06-15 12:24
The only way I see this compression scheme working and being feasiblein polynomial time is when the data is sufficiently large and sparse.
The header could consist of the original file length (and perhaps the number used for decompression, but let's assume that this is assumed to be 2).
Then the compressed file would look like {header}{pos}{pos}{pos}.... The length (in bits) of a single {pos} would be ceiling $ logBase 2 originalLen.
Essentially, this scheme would specify where the 1 bits are, so a file 000000100000001000010000000000000000000000000000001 would be compressed as header + 4 positions.
The scheme could also specify whether the positions specify 0s or 1s. Using a number p other than 2 is possible too, but in that case the packet length will increase to represent the state of the digit (ceiling $ logBase 2 $ originalLen * (p - 1)).
Additionally, if p was, say, 8, but the only digits appearing (aside from 0, being the base digit) were, say, [1, 2, 5, 7] (notice length digits == 4), the number of states would be 4, so the length of one {pos} would increase by 2 bits, not 3.
That's just expanding on >>91, though.
Name:
Anonymous2009-06-15 12:51
The only way I see this compression scheme working
stopped reading right there
Name:
Anonymous2009-06-15 12:52
>>113
A shame, you could make fun of me more after you read my post.
>>112
Honestly, the problem is using base 2. It's already taking up too much space. Look, the number 8 is one digit but you need 4 bits to store it! (#b1000 in some Scheme dialects).
The first thing you want to do is convert everything to base 32, that is, [code]0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V[code]
where `Z' represents 35 * 36[sup]0[/sup]. So let's take a number like
10001. This is just `H'. Now things get tricky.
Take a number like 010000111001110010111101111110011100. This parses directly as "GRUNNUR" (except for the leading zero, which is troublesome but I'm working on that), which happens to be an Icelandic word for "foundation." Therefore all we need to do is store a small numeric reference to a set of dictionaries (i.e., store the number [i]n[/i] for the [i]n[/i][sup]th[/sup] dictionary), and then the number of the word in that dictionary. Say, "GRUNNER" is the 50th word. The dictionaries would be open source, of course, so everyone could reference them.
I know what you're thinking, though: we'd need so many dictionaries! That's true, so we'll just reference them by the nearest prime strictly less than it and an offset, and compress even the [i]reference[/i]!
Dictionaries have a lot of words, too, so sometimes it may be that the [i]n[/i][sup]th[/sup] word has [i]n[/i] larger than the word. This is where the algorithm really shines. We apply the compression technique [i]to this number![/i] Now all we need to store is a reference that we've applied this step twice. I call it the ``WHY Combinator'' symbol, or ``W'' (note it is unused in our base 32 conversion).
It's a shame I am only just learning about data compression now, it's so hard to prove these things out that I'll probably never get around to making it work. I just program as a hobby.
>>122
Well we can make our own dictionaries, too, besides normal language dictionaries. Also, you could split the file up at different places, you don't have to have a long string of random characters, you can have a short sequence of them, say, three letter sequences not in the dictionaries get their own dictionary.
Name:
Anonymous2009-06-15 15:49
Infinite compression is possible, it just requires infinite computing time.
Name:
Anonymous2009-06-15 15:54
>>124
Infinite compression is possible in O(1) time. It's just the decompression that's slightly harder.
Name:
Anonymous2009-06-15 16:04
Infinite compression may be analogous to deletion, but provided the original (uncompressed) data was procedurally generated then decompression is a simple matter of regeneration; the only drawback being the requirement for decompression algorithms specific to the data being restored.
Name:
Anonymous2009-06-15 16:13
>>126
Depends on the data. In the worst case, data compressed with compress :: [a] -> [a]
compress _ = []
can be decompressed by trying all the possible combinations of bits, but then the original file is required to check if a combination is correct.
Name:
Anonymous2009-06-15 16:19
Ok, I've got an idea for a compression algorithm:
1. Take the md5 of the file
2. Using a web crawler, search the web for a file with the same md5
3. Save that file's URL
To decompress:
1. Download the file
Name:
Anonymous2009-06-15 17:12
>>128
Maybe there could be a big, MD5-indexed file storage site. You can submit any file, and then retrieve it again by its hash sum.
Name:
Anonymous2009-06-15 17:57
>>129 You can submit any file, and then retrieve it again by its [MD5 hash].
128 bits per file? Bloated storage. We still need to compress these values somehow. Maybe store the CRC of the MD5 hash instead...
>>131
LOL, if a collision happens, then whatever more recently uploaded file with that hash will be retrieved. :) If you use a better hash function, the chances of finding a collision are practically zero.
Name:
Anonymous2009-06-15 18:53
>>133
My hash function is INPUT XOR 0x55 -> OUTPUT and I have not yet found a collision.
Name:
Anonymous2009-06-15 18:57
Excellent trolling of an unworkable idea. Bravo, FV!
Name:
Anonymous2009-06-15 22:47
>>134
LOL, that's not a hash function, since the output isn't fixed-width.
Name:
FrozenVoid2009-06-16 0:56
hey guys, I've given it up. but I'm working on a new compression algorithm based on anus haxing.
this also reminds me that my anus is itching very badly.
>>136
Maybe he meant he xor'd the first byte with 0x55, then xor'd that with the next, then that with the next, like foldr xor 0x55 input. I think that's a better idea anyway.
Name:
Anonymous2009-06-16 1:27
>>138
There ya go, man! Keep as cool as ya can. Face piles of trials with smiles. It riles them to believe that you perceive the web they weave. And keep on thinking free...
Name:
Anonymous2009-06-16 1:29
>>138
In that caes, files with the same bytes in different orders would have the same "hash sum."
Name:
Anonymous2009-06-16 3:24
>>140
Don't worry, no one will ever find a collision because no one is stupid enough to use it.
Bad news: After struggling with C and arbitrary precision i gave up on it. C isn't fit for this.
Unless i magically learn something like haskell(or anything that has arbitrary floats and reasonably fast) ,
its not going to work.
hey hey, i've got a great compression idea.
like, what you do is you like hash your file, right? but you don't hash it, you use a reversible hash algorithms.
so you hash it, and then you unhash it on the other side.
considering that hashes are usually considerably smaller than their input i think this will change the world as we know it.
>>150
Probably FV posting as Anonymous, but here it goes:
Hash: A mathematical one-way, irreversable algorithm generating a string with fixed-length from another string of any length. ...
A hash is lossy by definition, it takes a (possibly large) data set and applies some lossy/irreversable algorithm to it which converts it into a fixed sized value. That's the definition of a hash. If you could make them reversable, it wouldn't be a hash, and most likely you would need a size as big as the input, thus making it useless for compression purposes.
Real lossless compression algorithms can't compress(by much, or it could even need more space than the input) already compressed/random data any more.
Lossy compression algorithms for media(video/audio) work by discarding data which might not be noticeable by everyone's sensory organs.
IHBT
Name:
FrozenVoid2009-06-19 5:08
>>152,>>150 is by Anonymous.
I know what hashing does and the fact of hash collisions can disprove any such claims.
Hashing erases most of data it operates on, leaving pseudo-random result as if it was seed for RNG.
_______________________________ http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est
Name:
Anonymous2009-06-19 5:17
>>153 I know what hashing does and the fact of hash collisions can disprove any such claims.
The fact of hash collisions can disprove any claims that you know what hashing does? That's unusually reasonable of you.
Name:
FrozenVoid2009-06-19 5:22
>>154
Simply by having a single hash collision, it means the file "decompressed" from hash would have to be 2 or more files
such collision is indicator the hash isn't unique for the file.
>>152
You wrote that yourself and it shows. The opening statement is wholly incorrect. While the problem of determining what input generated a hash in undecidable, it is important to note that this has no implications in practice. It is possible to create a list of all inputs that hash to a given output, and more importantly to create such a list ordered from smallest input to largest. Thus, someone with enough computing power and knowledge of the filesize/memory limits on the person who done the hashing can, in finite time, generate a list of all inputs that generate such a hash. They can then again- in finite time (this is where the undecidability comes in) interpret which input is most likely to have come from a genuine human source.
Name:
FrozenVoid2009-06-19 5:27
>>156
All these masturbatory fantasies about "infinite computation time" have no place in reality.
The number of files you can construct from a single hash is practically infinite.
Even if we limit hash decoding to some subset(e.g. filesize=x) then it becomes unfeasible after about several kbytes, due hashing taking time longer then expected lifetime of the universe).
e.g. filesize=3000bytes,numbers= (2^8)^3000 =2^24000, at speed of 2^100 hashes per second
this would take 2^24000/(2^100 *3600*24*365) years, which is huge compared to 14billion years universe existed at all.
>>157
Please don't talk about things you don't understand. Thanks, hopefully you wont scare away any real responses.
Name:
FrozenVoid2009-06-19 6:20
>>159
What are they scared of? Its obvious brute-force hashing is infeasible.
____________________________________ http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est
Name:
Anonymous2009-06-19 6:26
>>160
Please don't talk about things you don't understand. Thanks, hopefully you wont scare away any real responses.
Name:
FrozenVoid2009-06-19 6:37
>>161 Repeating yourself isn't a convincing argument.
>>164
no, it's analogous to "I would never believe that idiot FrozenVoid over my friendly neighbourhood Anonymous" aka discrimination against the mentally handicapped.
Name:
Anonymous2009-06-19 6:53
>>165
But, FrozenVoid is the greatest mind in this century and you should believe me. Because i said.
>>170
ahem. You refer to it as "voice of the masses" but ignore that anyone could be anonymous.
wat?
this is a troll, right? a non-FV troll i mean
Name:
Anonymous2009-06-19 7:11
>>171
What is the difference between FrozenVoid and Anonymous?
Name:
Anonymous2009-06-19 7:16
>>172
FrozenVoid (you) likes to draw attention to himself by using an obnoxious name and signature.
It's akin to walking down a public street without wearing pants; everybody will look at you, but not for positive reasons.
Name:
Anonymous2009-06-19 7:18
>>173
In that case, my friend it is you who isn't wearing pants.
Everyone in real life has a name. All those people on the street have IDs or passports,drivers licenses, etc.
>>174
We are not currently walking down a street.
We are currently talking on world4ch, an anonymous programming BBS. Offline I have a name. On other websites I have a name.
Ofcourse that's pretty damn obvious, you're just playing dumb. IHBT
>>176
i can't tell whether you intentionally miss the point or whether you just have ADHD or something. oh wait, i forgot, you don't bother actually reading other people's posts
Name:
FrozenVoid2009-06-19 7:34
>>177
Your interpretation is too literal.
When you say "akin to walking down a public street" you implicitly describe the worldview of this "public street"
which is strangely exactly what you think of "Anonymous worldview" i.e. submission and conformity to the masses.
The 'Anonymous Society' isn't real. Its an illusion which people take for granted when they lack information("stranger","Passerby","somebody") who are normal identifiable people.
>>178
you're an idiot. like i said before, you completely missed to point.
what the hell is your obsession with "conformity"? what are you, 14?
4chan has a culture of anonymity, just as public streets have a culture of wearing clothes. if you purposely try to go against that culture you will be shunned and disliked.
guess what. 99% of forums on the internet require you to make an account, login, and post with a name; this place just happens to be different, it doesn't require that bullshit. people here prefer to post without a name and talk to other nameless people. you should either learn respect that desire or leave - the internet is large enough that i'm sure you can find some place where people welcome your faggotry with open arms.
Name:
FrozenVoid2009-06-19 7:53
>>179
Aren't we forgetting something? "this place just happens to be different", but not:
The place where people "prefer to post without a name" doesn't restrict posting with a name and a tripcode.
Its as anonymous as you want it to be. Thus your nagging that people shouldn't have a name, just because majority doesn't have it is pure conformity."4chan has a culture of anonymity" aka everyone "should learn respect that desire" for conformity.
>>180
ugh.
nobody on /prog/ uses the tripcode functions. most people here dislike that kind of thing.
go to /r9k/, it's 4chan's haven for people like you.
if you were a TRUE anti-conformist you would hit your dick with a hammer. you know why? BECAUSE YOU CAN! nobody is stopping you! INFACT you're CONFORMING by NOT hitting your dick with a hammer.
quick! do it now before you catch the conformity virus!
I'M NOT JOKING, ANTI-CONFORMIST.
HIT YOUR DICK WITH A HAMMER NOW OR ELSE YOU ARE A CONFORMIST
>>186
acting according to certain accepted standards; "i don't think it's a very good idea to hit your dick with a hammer".
hey hey, FV, riddle me this: where does common sense come from if not society?
doh ho ho, common sense is learned from the people around you. common sense IS conformity. my common sense dictates that i should wear clothes when i go outside, but some african tribesman's common sense is that clothes restrict his movement and make him slower at running.
if i was born in the same tribe, would i still think it's inappropriate to walk around naked? no, i would think it is common sense to run around naked.
you just don't get it, do you?
Name:
FrozenVoid2009-06-19 8:13
Common sense isn't conformity. Its like comparing cars with oranges.
Conformity is at best an conservative guiding force in society, common sense is personal wisdom which can be non-conformist as well as conformist, depending on the person. Someone who is fairly involved in social circles would absorb knowledge far more conformist then those who build their 'common sense' in solitude.
When people talk about 'common sense' they refer to their own version of such 'sense' and describe why your version is defective(aka "Common sense isn't so common").
>>188
no, you are wrong.
now i'm going to do what you commonly do:
>Common sense (or, when used attributively as an adjective, commonsense, common-sense, or commonsensical), based on a strict construction of the term, consists of what people in common would agree on: that which they "sense" (in common) as their common natural understanding. http://en.wikipedia.org/wiki/Common_sense
the "common" part of "common sense" refers to the fact that it is shared among many people.
"Common sense isn't so common" is just a pun.
facepalm.txt.
now go away and never come back please.
you make /prog/ unpleasant.
>>189
Common !=conformity. Isn't this hard to imagine e.g. water which is common resource,
but drinking water doesn't make you conformist.
"you make /prog/ unpleasant." Truth hurts. In my edition its even stinging.
Truth hurts. In my edition its even stinging.
i have no idea what this is supposed to mean.
did i hurt your feelings that bad? sorry :'(
here, have a hug.
but seriously now, leave.
i'm going to stop responding to you now. you obviously don't know how to admit when you're wrong - but i suppose that's just because teenagers are never wrong, ammirite?
Name:
FrozenVoid2009-06-19 8:30
>>192
I have no intent to admit "wrong" when i'm right, just to please you and your conformist buddies.
You have to prove that i'm wrong, instead of declaring me "wrong" to have any effect.
>>193
In a couple years your going to look back on this and either feel deeply ashamed or lol at your stupidity depending on your personality.
Name:
FrozenVoid2009-06-19 8:39
>>194
Ah, you expect me to integrate into your little /borg/ hive? Highly unlikely.
As for old threads, i don't read them, unless i'm searching for something.
Oh God. I see stupid argumentation and note a number of missing posts. This can only mean that you haven't successfully ignored /prog/'s greatest contributor!
>>202
A metaphor that makes sense? I must be on the wrong board. >>201
I doubt he read it. Scanning his eyes from left to right, and flicking pages, maybe.
Name:
Anonymous2009-06-19 12:29
>>202
An infinite never-ending sound loop that will eventually generate a sound so loud it blows up the universe?
>>204
More like routing the output of a sentience matrix into itself using the power supply as a bridge, creating a feedback loop that enhances intelligence to the point of total omniscient sublimation.
Update:MPI and GMP refusing to link with my compiles and i downloaded MAPM, which apparently works.
(I have GCC(install corrupted and refuses to work),DMC(works ok),TCC(fails to work sometimes with function errors,but very fast and produces small .exes))
The problem its converting all files to decimals(which goes 100bytes/second). here is the current code(inpbuf is where file is stored):
M_APM decstr=m_apm_init();printf("Converting number in inpbuf to decimal:\n");
//store multiplier,store current byte,current byte multiply,current decstr copy,
M_APM pow2store=m_apm_init();M_APM pow2base=m_apm_init();M_APM bytexpow=m_apm_init();M_APM decstrcopy=m_apm_init();
//byte multiplier (2^8)^(pos from end:0=1) first bit=1,
M_APM pow2exp=m_apm_init();m_apm_set_long(pow2exp,2L);
unsigned long int curbyte;unsigned long int curpow;unsigned long int pos;
for(pos=(filesize-1);pos;pos--){
curbyte=(unsigned long int)inpbuf[pos];
if(pos%1000==0){printf("Processed %u of %u bytes\n",filesize-pos,filesize);}
//curent byte as int
m_apm_set_long(pow2base,curbyte);
//this is multiplier for current byte (2^8)^pos=2^(pos*8)
m_apm_integer_pow_nr(pow2store,pow2exp,8*((filesize-pos)-1));
m_apm_multiply(bytexpow, pow2base, pow2store);
m_apm_copy(decstrcopy,decstr);
m_apm_add(decstr,decstrcopy,bytexpow);
}
printf("Inpbuf succesfuly converted to decimal.\n");
m_apm_free(pow2store);m_apm_free(pow2base);m_apm_free(bytexpow);m_apm_free(decstrcopy);m_apm_free(pow2exp);
char *testbuf=malloc(filesize*26);m_apm_to_integer_string(testbuf,decstr);
fwrite(testbuf,1,strlen(testbuf),output);
>>211
There you are mistaken. I bothered with a reply in hopes that you would cease posting in future due to your newly acquired knowledge. Sadly, I think it's fallen on deaf ears.
if i had to choose between giving up a hand, my vision, or my hearing i would probably choose my hearing.
i wouldn't be able to stand being blind.
just saying
Name:
FrozenVoid2009-06-21 8:54
>>212 See, you have an agenda. Your "gives a shit" claim is self-contradictory to "I'd like to take the time", but you miss the point of it, pretending to be remote observer and not active participant.
Just for comparison this is the code i use to convert to hex(0x??), which is far more elegant.
char *hexdata=malloc(filesize*2+2);hexdata[0]='0';long pos;hexdata[1]='x';char tmp_0f;char tmp_f0;
for(pos=0;pos<filesize;pos++){
tmp_f0=(inpbuf[pos]&0xF0)>>4;tmp_0f=(inpbuf[pos]&0x0F);
hexdata[pos*2]=tmp_f0<0x0a?tmp_f0+48:tmp_f0+55;
hexdata[(pos*2)+1]=tmp_0f<0x0a?tmp_0f+48:tmp_0f+55;}
Jesus Fucking Christ, your unindented, non-monospace code looks bloody disgusting; doesn't it irk you?
Name:
FrozenVoid2009-06-21 9:47
>>218 my normal style is:
char *hexdata=malloc(filesize*2+2);hexdata[0]='0';long pos;hexdata[1]='x';char tmp_0f;char tmp_f0;
for(pos=0;pos<filesize;pos++){tmp_f0=(inpbuf[pos]&0xF0)>>4;tmp_0f=(inpbuf[pos]&0x0F);
hexdata[pos*2]=tmp_f0<0x0a?tmp_f0+48:tmp_f0+55;hexdata[(pos*2)+1]=tmp_0f<0x0a?tmp_0f+48:tmp_0f+55;}
_________________________________________________ http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est
Name:
FrozenVoid2009-06-21 9:51
Without comments i can view most programs in 1 page and instantly see all variables,etc
for debugging its fairly easy to split offending lines into 2,3 segments and recheck it.
>>220
Good luck tracking down an AI determinism bug in a large scale game- where you didn't write any of the code and the person who did wrote it using your unique ``style''.
Name:
FrozenVoid2009-06-21 10:07
>>221 there is astyle for those who prefer to waste their screen real estate.
My code isn't scrambled. Its predictably ends on delimiters.
_______________________________ http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est
>>222
Are you telling me you could find an AI determinism bug if the engine was written like this?
Name:
Anonymous2009-06-21 10:18
>>219
Your ``normal style'' is absolutely unreadable. Please note that other people might want to read your code for some twisted reason, so you should at least try to make it so that my eyes don't bleed when I see your code.
Name:
FrozenVoid2009-06-21 10:18
>>223
If i knew what this "AI determinism" meant (in this context i guess its something like "NPC goes crazy")
and has minimal reference for the functions employed in code, its easy.
Also, all these formatting can be changed with Astyle in minutes. No need to fuss over a minor point.
>>229
Not FIOC-like style, just any readable style.
Name:
FrozenVoid2009-06-21 10:53
>>230 'Readable' is very subjective. While you read text files, you don't complain that every page is "not properly indented", or do you? You just read them like books.
__________________________________________ http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est
Name:
Anonymous2009-06-21 10:54
>>230
No, you are demanding FIOC style exactly. The "variety" of styles you pretend to accept are in fact different in curly braces' placement rules, erase curly braces altogether and you have FIOC, one and only.
That is to say, clearly posting this was not a good idea.
Name:
FrozenVoid2009-06-22 3:35
>>235 It wasn't. But nevertheless amusing.
In other news: A new version of my algorithm will be published today which lowers base search times drastically(and very compact).
1073741819
What a long, messy number. Should be compressed.
Name:
FrozenVoid2009-06-22 10:44
GMP: gives this: POLINK: error: Unresolved external symbol '__imp___iob'.
MPI: crashes when used to initaliaze second mp_int variable. exception -1073741819
MAPM: works but its gives corrupted results in output and conversion to decimal very slow
MIRACL:this i'll try next
________________________________ http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est
Name:
FrozenVoid2009-06-22 11:05
MIRACL refuses to compile with Pelles C.
D:\Program Files\PellesC\Include\miracl.h(249): error #2001: Syntax error: expected ';' but found 'mr_large'.
Progress is looking good! I'd guess five or six more years before revelation sinks in, except you'll give up due to "language quirks" long before then.
Hello FrozenVoid
I'm interested in using your compression scheme as the basis for my thesis on data compression and source coding. I would appreciate it if you could post some statistics regarding compression rates and compression times, as well as benchmarks comparing your scheme to other algorithms such as Huffman, LZ, Arithmetic, SFE, etc.
Thank you
>>256
"Huffman, LZ, Arithmetic, SFE, etc" belong to another class of programs(i'd like to call it dictionary compression).
They compress redundant data. I 'compress' numbers(i'd call it algebraic composition).
There is no reason i couldn't drastically change e.g. my core routines to extract roots/logs instead of searching for powers
as long as they result in integers equal to files. I simply design equations.
Hey guys. I found revolutionary data compression scheme too! It happened two days ago when I JUST GOT ENLIGHTENED OH MAN! So listen what I thought of:
Everyone knows that DATA is represented by 0s and 1s yeah?
YEAH
SO
Why don't we recode DATA with smaller 0s and 1s, so many bits fit in one slot!?
FUCK YEA!!!
∧_∧
(-Ò∀Ó)
Name:
Anonymous2009-06-23 19:38
>>267
That is just the basics of how compression works, youo will need to actually programmed it though I hope this is clear
∧_∧
(-Ò∀Ó)
Name:
Anonymous2009-06-23 20:11
>>268
NopeI can't. I have an ASD_OUT_OF_DATAZ0R eror or smoething ┐(´ー`)┌
Name:
Anonymous2009-06-23 20:14
>>269
But I'm considering learning Hawaii++ oh wel
zomg we need a trinary coprozessor. srsly a VHDL implementation might work
Consider we have a LUT of all primes for 2 times our word size.
Always use the corresponding word with the LUT.
One half nibble to indicate:
00....base
01....mul
10....exp
11....double word indicator
with a 18bit word size we would have 204 LUT operations and a 16 bit header (for leading leading zeros and appliance instructions) inside a 4kbit chunk. The problem is the LUT would be 288gb large. But you could store the single size LUT in ram and the double in a ssd.
Might not work, you'll probably need data which contains over 100 primes but could always crack rsa with this thing.
YES I BUMPED IT
Name:
Anonymous2010-09-18 20:11
Remember that infinite compression guy? did he ever publish anything about his alleged algorithm?
Name:
Anonymous2010-09-18 20:18
idk I did stop looking once I saw ``floating point" and ``primes"