Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Infinite Compression techniques

Name: FrozenVoid !!mJCwdV5J0Xy2A21 2011-11-09 9:48

I present a system which would be capable of unlimited compression of any data.
Theory:
Every number can be mapped 1:1 to positive unsigned integer(representing the number itself)
Every integer can be represented as range of floating point numbers
i.e. 3 is range from 3.000... to 3.999...
Now if we multiply the original number by 10: 3*10=30
the range is also multiplied, 30.000... to 39.999... all of these numbers divided by 10 give 3 as integer.
Suppose we can alter original number by shifting the range up or down by supplying an extra factor
3+1 or 3-1, with these 3.000...-3.999... ranges become 4.000...-4.999.. and 2.000...-2.999... respectively
The compression is as follows. The original number is multiplied by a huge scale to create number
which is at least twice longer in file length, giving very large floating point range.
Now we multiply this range by adding a 64bit scale modifier(applied to original number) which shifts the range up and down so the space of the range is now 2^64 times bigger than original.
The compression is search for Any number in that huge range which can be represented more compactly
when one of these is found(for some function like e^A) A is recorded along with scale modifier.
Since the range is enormous there are certainly some numbers which can be represented in short form as
function(x)=number_in_range.
The decoding is as follows,function(x) is runs and results number is divided by scale, then a scale modifier is applied
to get the original number, which is converted to file.

Name: FrozenVoid !!mJCwdV5J0Xy2A21 2011-11-10 7:06

>>39
30.999../10=3.09=3(truncate)
30.000../10=3.00=3(truncate)

Name: FrozenVoid !!mJCwdV5J0Xy2A21 2011-11-10 7:10

>>40
The idea is that K can be modified during search(K=power of 2,K=fib(x0,K=e^1000) so the Result can be found closer to Y and Y can be fit closer to window by modifying y(its synergistic process)

Name: Anonymous 2011-11-10 9:08

>>1
Since the range is enormous there are certainly some numbers which can be represented in short form as function(x)=number_in_range.

How does one find said function?
I get your idea, it's called generalization (also used in AI) but the problem is the same as with AI, the more complex the function, the longer it takes to find, even if the resulting function is only 20 bytes in size, probability says that it will still take approximately 256^20 iterations to find.

Name: FrozenVoid !!mJCwdV5J0Xy2A21 2011-11-10 9:32

>>43
That why i don't like running this algorithm, the search time in the search space is huge and the only matches are for small files, growing exponentially longer for longer files. i.e. each search run to test(new formula/search function/Kmod/Scalemod) will take hours.
I'm certainly missing some optimizations there but its very limited since there is need to search huge amount of functions calls against a shifting window, inherently requirement to get correct results is to get inside the window or to shift window towards results(both take many operations(window shifting is faster on average, but requires to generate new y values for windows shifting instead of converging into window), leaving nothing to do but watch as search space is searched a region at the time).

Name: Anonymous 2011-11-10 9:46

>>44
Why can't you use smaller chunks like 8 bytes?

Name: FrozenVoid !!mJCwdV5J0Xy2A21 2011-11-10 9:51

>>45
1.metadata for a chunk(the algo values for scalemod,y,k) always require some minimum space ~>40 bytes
2.8 bytes is too small for search space and any matches there will be proportionally bigger than matches in 1024 byte(i.e. the value matches will takes about or more then 8 bytes too,ruining compression, plus since the search space is small there will be less potential windows(requiring much more window shifting and scalemod)).
Its like ZIP compression on 3 byte chunks(it wont work).

Name: FrozenVoid !!mJCwdV5J0Xy2A21 2011-11-10 22:47

There is also interesting variant a with a single Y calculation or precomputed Y and window shifting using dynamic K formula(K-KMod*N+a)

Name: Anonymous 2011-11-10 23:09

http://en.wikipedia.org/wiki/Berry_paradox

You can't really compress data for which there is no algorithm shorter than the data itself which generates the data (and only it).

Name: FrozenVoid !!mJCwdV5J0Xy2A21 2011-11-11 1:42

>>48
The problem isn't that.With the current search algorithms is too slow to be finished in a thousands of years.
When i get something faster i could easily check a chunk of data and decided if it is compressible by a single function, or check multiple functions. That means the compression is limited by the amount of time it takes to check all the search space near multiplied Z((Max window)*(Max scalemod) distance, which increases search space in both directions).
I'll probably try something along lines of >>47 to save computation time

Name: Anonymous 2011-11-11 2:07

infinite compression would require infinite decompression

Name: FrozenVoid !!mJCwdV5J0Xy2A21 2011-11-11 2:25

>>50
The process is assymetric.
You just get (func(Y)/(k-Kmod))+-scalemod and thats the original number.

Name: Anonymous 2011-11-11 3:31

/* inf_compress: compress data into a single word */
unsigned int
inf_compress(char *data, int len)
{
    return 0;
}

Name: FrozenVoid !!mJCwdV5J0Xy2A21 2011-11-11 3:43

>>52
Thats just a useless function that shouldn't even be run and has zero relevance to the thread.

Name: Anonymous 2011-11-11 3:50


unsigned char inf_compress(char *data, int len) {
    return (unsigned char) FROZEN_VOID;
}

Name: FrozenVoid !!mJCwdV5J0Xy2A21 2011-11-11 4:34

Well since i don't have anything useful to compress and i'm not going to develop software for retards, i'm deleting the project.

Name: Anonymous 2011-11-11 5:51

>>55
and i'm not going to develop software for retards
Careful here, young man, you are close to immanentizing the Barber's Paradox, and who knows what might happen then!

Name: FrozenVoid !!mJCwdV5J0Xy2A21 2011-11-11 6:29

>>56
I hope some company will develop a closedsource version thats fast enough and get millions for paid cloud/file storage,etc though i will not use such services anyway:hard drive space is enough for any uncompressed files i have.

Name: Anonymous 2011-11-11 7:58

>>55
not going to develop software for retards

`>implying you ever actually wrote anything useful to begin with

i'm deleting the project

WAAAAAAHH NOBODY LIKES MY PROJECT Y U GUISE SO EVIL BAWWWWWWWWWW

hint: your software is shit, the way you write code is ridicolous at best, you have never heard of the word 'intendation', you pointlessly obfuscate even simple statements, and you are, ontop of that, a raging autistic furry

Name: Anonymous 2011-11-11 8:14

>>57
I hope some company will develop a closedsource version thats fast enough and get millions for paid cloud/file storage,etc

That is highly fucking unlikely, and it's even unlikelier that they're going to use your autism-powered libyiffmytailhole

now go back to your hugbox

Name: Anonymous 2011-11-11 9:00

>>57
No need to worry, it won't happen because it's mathematically unfeasible. Even given 'infinite' time and space for compression, there is data which is just plain uncompressible (takes more space than the program generating it).

Name: FrozenVoid !!mJCwdV5J0Xy2A21 2011-11-11 9:27

>>60
There are alot of numbers in the window of (scalemod*K) which is of thousands of bytes long(its much larger than a chunk of data being compressed, though its location far up on the number line).
by telling me its "mathematically unfeasible" you mean no number in those windows can be represented via a function.
I.e. there huge deserts in the number line which cannot be reached by any formula

Name: Anonymous 2011-11-11 10:12

>>61
There are 2 major problems with your idea, first is that the multiplier may very well take as much space as your data (or around that size), which already defeats the purpose. If that problem is ignored, there will be data which cannot be generated by the function and have the function's encoding be smaller than the data, this being even in the case where you're looking for a wide range. Instead of limiting the search to simple mathematical functions, I tell you to look at the best possible encoding (which is why I linked you to Algorithmic information theory and Kolmogorov complexity, which is uncomputable, yet I still allowed it in that example by saying 'infinite space and time'). Of course, the encoding(of the code of the function) is irrelevant due to the Church-turing thesis as you will always be able to translate it from one language to another, so you should fix it to some particular language/encoding, but once that is fixed, there will always exist large amounts of uncompressible data (algorithmically random). It may be that you would never actually want to compress such data of course (for example, if our universe was locally deterministic, which is unlikely given what we know from QM). However, even if we ignore those algorithmically random sequences and your multiplier problem and just consider the search for that particular function (finding the best one is not possible due to KC being uncomputable, but surely one can find better representations given enough time and space). Not that such an exhaustive search is feasible in this particular universe we live in (resources too limited), but theoretically the idea is feasible (ignoring our current local limitations).

Name: FrozenVoid !!mJCwdV5J0Xy2A21 2011-11-11 10:57

>first is that the multiplier may very well take as much space as your data
there 2 parts, and they don't take as much space
K is the window multiplier(its 2^power about 8 bytes max to store power)
Kmod may takes some space(this does not mean it should be even more than 1/10 the filesize,a function generated Kmod would be also 8 bytes) Kmod is dependent on how much search space you want to cover
Scalemod can be up to half filesize(to provide as wide windows for search as you like)
The maximum it would take 60% of initial data if compressed successfully.
Also if we drop Kmod, K+Scalemod is about half the filesize;
without Kmod search space is:
(Z-S)*K <> (Z+S)*K  (though its unlikely to yield any matches; its one window only(even 128bit Kmod adds huge amount of windows)
with Kmod:
(Z-S)*(K-Kmod)<>(Z+S)*(K-Kmod) search to (Z-S)*(K+Kmod) <>(Z+S)*(K+Kmod)

Name: Anonymous 2011-11-11 11:58

Complete bullshit, you don't have any programs imaginary or not, that even work and you expect us to believe this autistic fantasy.
Show code, any real code which you wrote instead of mathemathical masterbation.

Name: FrozenVoid !!mJCwdV5J0Xy2A21 2011-11-11 12:04

>>64
I only have a small prototype which is limited to 1 function
and its very very slow. You can have it since i'm not interested in this anymore:
//====Headers
//the code is of bit of mess, this is unchunked, single file at onc e version which takes alot of time to search
#define VERSION "Infinity Compressor 1.01 by FrozenVoid\n"
//..\gcc -O2  main.c -lmpfr -lgmp  -o a.exe
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <float.h>
#include <time.h>
#include <gmp.h>   //using libgmp.a
#include <mpfr.h>
//======Types
#define u1 unsigned char
#define u2 unsigned short
#define u4 unsigned int
#define u8 unsigned long long
#define s1 signed char
#define s2 signed short
#define s4 signed int
#define s8 signed long long
#define f2 short float
#define f4 float
#define f8 double
#define f10 long double
//====Bithacks
#define setb(o,p) (o|=(1<<p))      //byte| 1<< pos  
#define clrb(o,p) (o&=(~(1<<p)))  // byte | 11101111
#define togb(o,p) (o^=(1<<p))
#define chkb(o,p) ((o>>p)&1)
#define bitchar(bitchar_bit)    (48|bitchar_bit) // 48|0 or 1
//====Func GCC
static __inline__ u8 rdtsc(void){u8 x;
 __asm__ volatile (".byte 0x0f, 0x31" : "=A" (x));     return x;}

u8 fsize(u1* filename){
FILE* tfile=fopen(filename,"rb");
if(!tfile){perror(filename);return 0;}
fseek(tfile,0,SEEK_END);
u8 sz=(u8)(ftell(tfile));fclose(tfile);return sz;}

u1* getcontent(u1*filename){//file to buffer
u8 size=fsize(filename);//
if(!size){perror("Zero length content"); return NULL;}
u1* res=malloc(size);
if(!res){perror("Malloc failure");return NULL;}
FILE* inpfile=fopen(filename,"rb");
if(!inpfile){perror("File cannot be read");return NULL;}
u8 findex=(u8)fread(res,1,size,inpfile);fclose(inpfile);
if(findex!=size){perror("size mismatch");return NULL;}
return res;}


void readcontent(u1* filename,u1* storage){
if(!filename){perror("Invalid filename"); return;}
u8 inpsize=fsize(filename);
if(!inpsize){perror("File empty");return;}
FILE* in=fopen(filename,"rb");
if(!in){perror(filename);return;}
storage=realloc(storage,inpsize);
if(!storage){perror("Malloc failure");return;}
u8 findex=(u8)fread(storage,1,inpsize,in);
fclose(in);
if(findex!=inpsize){perror("Read size mismatch");return;}}

void writecontent(u1* filename,u1* content,u8 size){
FILE* out=fopen(filename,"wb");
if(!out){perror(filename);return;}
u8 findex=(u8)fwrite(content,1,size,out);fclose(out);
if(findex!=size) perror("Write size mismatch");}

void buf2mpz(u1* string,mpz_t num,u8 size){
if(!string){perror("NULL string");return;}
if(!num){perror("Invalid mpz_t");return;}
if(!size){perror("Zero-length string");return;}
mpz_import(num,size, 1,1, -1, 0,string);
}


#define loop(loopx,loopy) for(loopx=0;loopx<loopy;loopx++)

#define fextension(string) strrchr(string,'.')?:".???"
#define isbit(xint,ypos)    ((xint)&(1<<ypos)) 
#define frop(x,filename) FILE* x=fopen(filename,"rb")
#define fwop(x,filename) FILE* x=fopen(filename,"wb")

#define outx(x,fil) fwrite(&x,1,sizeof(x),fil);
#define mpz(x) mpz_t x;mpz_init(x);
#define mpf(x) mpfr_t x;mpfr_init(x);
//static array
#define mpfr(x) MPFR_DECL_INIT(x, 53)  //returns float
#define DEBUG 1 //debug
#define DEBUG2 0 //symbolics
#define DEBUG3 0 //verbose
#define POWERPREC 512 //more=slower, more precise
#define fdisp(x) puts("");mpf_out_str(stdout,10, 0,x);
//===========MAIN======================
#define setch(o,ch,pos)  (o|=ch<<(pos<<1))  // pos 0-3,ch=0-3
#define getbb(o,pos) ((o>>(pos<<1))&3) //get 001100>>2 &3
#define getb0(b) (b&3)
#define getb1(b) ((b>>2)&3)
#define getb2(b) ((b>>4)&3)
#define getb3(b) ((b>>6)&3)

 //(1<<(pos<<1))
//========MAIN===
//code is from another compressor i wrote
void main(int argc,char**argv){
if(!argv[1]){printf("Syntax:cmp inputfile [-d=decode]");exit(1);}
//-----DECODE--TODO-----
if(argv[2] && !strcmp(argv[2],"-d")){//decode Data

}
//------INPUT--------
#define PR0 MPFR_RNDN
printf(VERSION);
/*
  >>  MPFR_RNDN: round to nearest (roundTiesToEven in IEEE 754-2008),
    MPFR_RNDZ: round toward zero (roundTowardZero in IEEE 754-2008),
    MPFR_RNDU: round toward plus infinity (roundTowardPositive in IEEE 754-2008),
    MPFR_RNDD: round toward minus infinity (roundTowardNegative in IEEE 754-2008),
    MPFR_RNDA: round away from zero.
*/
FILE* output=fopen("Result.cmp","wb");
u8 inpsize=fsize(argv[1]);
fwrite(&inpsize,8,1,output);//write inpsize to avoid misdecoding;
u8 bitlen=inpsize*8;
s8 Kmod=0; //window modifier
s4 check,checklow,checkhi,checkold;
if(!inpsize){perror("input size invalid");return;};
u1* input=getcontent(argv[1]);
//convert to mpz
mpfr_set_default_prec( bitlen);
mpz(Z);buf2mpz(input,Z,inpsize);
mpf(median);//Z->median
mpfr_set_z(median,Z,PR0);
mpf(scalemodmax);//window bounds(1/2 filesize to be safe)
mpfr_set_ui(scalemodmax,1,PR0);
mpfr_mul_2ui(scalemodmax,scalemodmax,inpsize*4,PR0);//scalemods max=half filelength in bits
mpf(K);//window base multiplier=2^inpsize8: 4 times;
mpfr_set_ui(K,1,PR0);//4 times muls
mpfr_mul_2ui(K,K, bitlen,PR0);
mpfr_mul_2ui(K,K, bitlen,PR0);
mpfr_mul_2ui(K,K, bitlen,PR0);
mpfr_mul_2ui(K,K, bitlen,PR0);

//base =1+ 1e-100
mpfr_t base;mpfr_init2(base,POWERPREC);
mpfr_set_str(base,"1e-100",10,PR0);
mpfr_add_ui(base,base,1,PR0);
 check=mpfr_cmp_ui (base,1);
//create initial bounds
mpf(Zlow);mpfr_sub(Zlow,median,scalemodmax,PR0);
mpf(Zhi);mpfr_add(Zhi,median,scalemodmax,PR0);
//multiply bounds to get initial,non-kmod window
mpf(Low);mpfr_mul(Low,Zlow,K,PR0);
mpf(Hi);mpfr_mul(Hi,Zhi,K,PR0);
//search for closest base match to window
mpf(Result);//Power is 512 bits;result stores Base^Power
mpfr_t Power;mpfr_init2(Power,POWERPREC);
mpfr_t Powermin;mpfr_init2(Powermin,POWERPREC);
mpfr_t Powermax;mpfr_init2(Powermax,POWERPREC);
mpfr_t Powerold;mpfr_init2(Powerold,POWERPREC);

mpfr_set_str(Power,"2",10,PR0);
mpfr_set_str(Powermin,"1",10,PR0);
mpfr_set_str(Powerold,"0",10,PR0);
//genrate match above Hi
if(DEBUG)puts("\nCalculating initial bounds");
preloop://
mpfr_pow(Result,base,Power,PR0);
check=mpfr_cmp(Result,Hi);
if(check<1){if(DEBUG2)printf("*");
mpfr_mul_ui(Power,Power,1024,PR0);
goto preloop;};//if max found set powermax
mpfr_set (Powermax,Power,PR0);
if(DEBUG)puts("\nCalculating closest match to window");
mainloop:;//search for closest match
//power=avg(Powermax,Powermin)
mpfr_add (Power,Powermax,Powermin,PR0);
mpfr_div_2ui(Power,Power,1,PR0);
mpfr_pow(Result,base,Power,PR0);
checkold=mpfr_cmp(Power,Powerold);
checkhi=mpfr_cmp(Result,Hi);
checklow=mpfr_cmp(Result,Low);
if(checkold==0){if(DEBUG2)printf("=");
if(DEBUG)printf("\nInitiating window search: Result is %s window",checkhi>0?"above":"below");
mpfr_out_str(output,16,0,Power,PR0);
goto windowsearch;//power stall:search window to Result
}
if(checkhi>0){if(DEBUG2)printf("+");
mpfr_set(Powerold,Power,PR0);
mpfr_swap(Powermax,Power);
goto mainloop;}
if(checklow<0){if(DEBUG2)printf("-");
mpfr_set(Powerold,Power,PR0);
mpfr_swap(Powermin,Power);
goto mainloop;}
//inside initial window
if(DEBUG)printf("Found match inside initial window");
mpfr_out_str(output,16,0,Power,PR0);//write power(hex)

goto ex1;//success :Inside initial window?
//implement correct window shifting code
windowsearch://to get Result into window, shift window
//search direction: simple add+-1
checkhi=mpfr_cmp(Result,Hi);
checklow=mpfr_cmp(Result,Low);
if(checkhi>0){//Result is up, shift window up
Kmod++;//Kmod Up, Hi/Low+=Scalemodmax
if(DEBUG2){printf("+");}
mpfr_add(Hi,Hi,scalemodmax,PR0);
mpfr_add(Low,Low,scalemodmax,PR0);
goto windowsearch;
}
if(checklow<0){//Result is down ,shift window down
Kmod--;//Kmod down
if(DEBUG2){printf("-");}
mpfr_sub(Hi,Hi,scalemodmax,PR0);
mpfr_sub(Low,Low,scalemodmax,PR0);
goto windowsearch;
}
//found Kmod: apply Kmod to
if(DEBUG)printf("Found Kmod for window");
ex1:;//output result :Scalemod & exit
if(DEBUG)printf("Calculating final scalemod");
mpf(newK);mpfr_set_si(newK,Kmod,PR0);
mpfr_add(newK,K,newK,PR0);//newK=K-+Kmod
fwrite(&Kmod,8,1,output);//write Kmod
mpfr_div(Result,Result,newK,PR0);//get into initial window
check=mpfr_cmp(Result,median);//if R>M scalemod neg
mpf(scalemod);mpfr_sub(scalemod,median,Result,PR0);
mpfr_out_str(output,16,0,scalemod,PR0);//write scaledmod(hex)
//====================================
; }

Name: FrozenVoid !!mJCwdV5J0Xy2A21 2011-11-11 12:17

The window search is not not as finished as the other parts (it for signed long long  Kmod, not a arbitrary prec Kmod) so you may want to improve that first(its just substracts/adds scalemodmax).

Name: Anonymous 2011-11-11 12:32

>>65
Why don't you use v0 anymore?

Name: FrozenVoid !!mJCwdV5J0Xy2A21 2011-11-11 12:37

>>67
That code is before #define v0 void was introduced

Name: Anonymous 2011-11-11 12:39

>>68
Ah, okay. I was worried there for a minute. It looks inconsistent and ugly this way.

Name: FrozenVoid !!mJCwdV5J0Xy2A21 2011-11-11 12:54

The GCC version of rdtsc is much uglier than any void types.

Name: Anonymous 2011-11-11 12:58

>>70
What does it do?

Name: FrozenVoid !!mJCwdV5J0Xy2A21 2011-11-11 13:01

return timestamp counter, compare with clean DMC version
inline unsigned long long rdtsc(){__asm{RDTSC}}

Name: Anonymous 2011-11-11 13:05

>>72
Why don't you just use clock() then?
Also, that version looks strange, since it doesn't return anything.
Compare with MSVC instead: __rdtsc()

Name: FrozenVoid !!mJCwdV5J0Xy2A21 2011-11-11 13:08

>>73
clock() is very slow. rdtsc is the only thing that can calculate a dozen CPU cycle delays(if not interrupted thread).

Name: Anonymous 2011-11-11 13:28

>>74
If you're timing so often that a few cycles matter you're doing it wrong.

Name: FrozenVoid !!mJCwdV5J0Xy2A21 2011-11-11 13:32

>>75
How else would you calculate a opcode costs in a small loop or a single operation?
You seriously expect to calculate speed of single random array access or couple of math ops with function calls to clock()?

Name: FrozenVoid !!mJCwdV5J0Xy2A21 2011-11-11 13:33

I don't rewrite code(making the code a huge pointless loop) to time it, i just insert rdtsc() calls at start and finish.

Name: Anonymous 2011-11-11 13:47

>>76
Yes, do it a million times in a loop.

Name: FrozenVoid !!mJCwdV5J0Xy2A21 2011-11-11 13:49

Name: Anonymous 2011-11-11 13:52

>>79
And you're implying that gives you the correct number of cycles?
You know, if you want such precision, just look at the asm and count the cycles per instruction manually.
Or use GDB.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List