Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-4041-

Image Scaling

Name: Anonymous 2012-08-22 16:07

/prog/, do you have fast and good image scaling algorithm in the form of:
void resample(uint8_t *Dst, int DstW, int DstH, uint8_t *Src, int SrcW, int SrcH);

???

Name: Anonymous 2012-08-22 16:15

StretchBlt for Weendauze. What are Dst and Src, RGBA arrays?

Name: Anonymous 2012-08-22 16:47

>>1                                    `
>2012
>8-bit color

ISHYGDDT

Name: Anonymous 2012-08-22 16:52

>>3
That's probably one uint8_t per color component.

Also, I have found that StretchBlt isn't hardware accelerated, but StretchDIBits is, though they still suck at performance. Thinking about a small test between GDI functions and some handwritten algorithms.

Name: Anonymous 2012-08-22 17:27

Okey. I found a function, but it uses floats, which are very slow.

void resample(uint32_t *Dst, int DstW, int DstH, uint32_t *Src, int SrcW, int SrcH) {
  int X, Y;
  float ScaleW = (float)SrcW/(float)DstW;
  float ScaleH = (float)SrcH/(float)DstH;
  for (Y = 0; Y < DstH; Y++) {
    for (X = 0; X < DstW; X++) {
      uint32_t C = Src[(int)(Y*ScaleH)*SrcW + (int)(X*ScaleW)];
      *Dst++ = C<<8;
    }
  }
}

Name: Anonymous 2012-08-22 18:07

>>5
What moron told you floats are slow? Multiplication requires a fixed point multiplier (just like integer multiplication) for the mantissa and runs in parallel with two int adders for the exponent.
r=(M_a*2^(E_a))*(M_b*2^(E_b))<=>r=M_a*M_b*2^(E_a+E_b-bias)
It can easily be done in one clock cycle.

Name: Anonymous 2012-08-22 18:09

>>5
If you're sure the ENTERPRISE BOTTLENECK is lack of FPU, you can do this:

void resample(uint32_t *Dst, int DstW, int DstH, uint32_t *Src, int SrcW, int SrcH) {
  int X, Y;
  int ScaleW = SrcW * 128 / DstW;
  int ScaleH = SrcH * 128 / DstH;
  for (Y = 0; Y < DstH; Y++) {
    for (X = 0; X < DstW; X++) {
      uint32_t C = Src[Y*ScaleH*SrcW >> 7 + X*ScaleW >> 7];
      *Dst++ = C<<8;
    }
  }
}

Name: Anonymous 2012-08-22 18:11

Oh, and checking for overflow (at * 128) is left as exercise for the reader.

Name: Anonymous 2012-08-22 18:20

>>8
Use SrcW << 7 instead of *128 for consistency
Overflow check: if (SrcW & (127 << (sizeof(int)*CHAR_BIT-7)))

Name: Anonymous 2012-08-22 18:42

>>6
first you have to convert ints to float for multiplication, then convert them back for memory addressing. float format is very different from int format, so conversion will take forever

Name: Anonymous 2012-08-22 18:42

>>6
time told me that it's about 60% slower.

Name: Anonymous 2012-08-22 18:50

>>3
Back to le imagereddits, ``please"!

Name: Anonymous 2012-08-23 4:29

>>10
Clearly you have no fucking clue what you're talking about.

>>11
Either your CPU is pipelined to the delay of one int adder (which is a ridiculously low delay), or your C compiler sucks.

Name: Anonymous 2012-08-23 6:01

You're forgetting gamma correction once again, Oniichan!

Name: Anonymous 2012-08-23 6:16

Are you saying that the image scaling doesn't scale?

Name: Anonymous 2012-08-23 7:49

Uhhh, I think the issue is not the float multiply but the conversion from float to int for addressing, which needs to be a floor function (in this case most likely floorf()) for correct results. In fixed point you'd get the floor() for free with the truncation (sth like value >> FIX_BITS). Also, some of the muls in
uint32_t C = Src[Y*ScaleH*SrcW >> 7 + X*ScaleW >> 7];
can be omitted by using stepping values (the stepping is linear in this case).
Maybe I'll write the function later today when I'm not as lazy.

Name: Anonymous 2012-08-23 8:44

>>6
floats aren't particularly slow, but converting to int and back is.
cvtss2si typically takes anywhere from 3 to 10 cycles with a throughput of 1/1 and takes up a store and add pipe.

Name: Anonymous 2012-08-23 8:50


static void scale(void *dst, size_t dw, size_t dh,
        const void *src, size_t sw, size_t sh)
{
    const uint32_t ix = ((uint32_t)sw << 16) / dw;
    const uint32_t iy = ((uint32_t)sh << 16) / dh;
    uint32_t *dstp = (uint32_t *)dst;
    uint32_t dx, dy, sx, sy;

    for (dy=0, sy=0; dy<dh; ++dy, sy+=iy) {
        const uint32_t *srcp = (uint32_t *)src + (sy>>16) * sw;
        for (dx=0, sx=0; dx<dw; ++dx, sx+=ix) {
            dstp[dx] = srcp[sx>>16];
        }
        dstp += dw;
    }
}

This is about as fast you get with a general C function.
If you really need to cram out performance a better way is to generate machine code on the fly, you calculate a scaling row by generating the store and load instructions (all an image scale is, is a series of load src, store dst, in some pattern).
For example for 2x scale you would need to generate;

load src, store dst, store dst, load src, store dst, store dst and so on.
And you only need to calculate one such row, then you simply repeatedly call this generated function and you only need to interpolate along the height.

Name: Anonymous 2012-08-23 10:01

>>18
Why not just *dstp++ = srcp[sx>>16]<<8?

*dstp++ is faster than dstp[dx++]

Name: Anonymous 2012-08-23 10:12

>>18
Ok, thanks for the code, now I don't have to provide a sample (I'm >>16). The code is probably better than what I would have supplied anyways...
But: didn't you forget something? if(dw == 0 || dh == 0) return;

Name: Anonymous 2012-08-23 10:25

>>20
But: didn't you forget something? if(dw == 0 || dh == 0) return;
PNG doesnt allow zero images (for this exact reason), so we are safe.

Name: Anonymous 2012-08-23 10:34

>>18 is pig disgusting gamma-incorrect point sampling.

Name: 22 2012-08-23 10:35

Actually it's gamma-correct but it's still pig disgusting point sampling.

Name: Anonymous 2012-08-23 10:41

>>19
It's gonna compile to the same code.

Name: Anonymous 2012-08-23 11:22

>>24
proof?

Name: Anonymous 2012-08-23 11:27

>>22
Well. I need to scale a tiny 320x240 image to fullscreen. It does the job.

Name: Anonymous 2012-08-23 11:29

>>25
Try it yourself.
Indexing a pointer with the same variable as the loop is a trivial optimization any old compiler can do.
The inner loop will be compiled as a pointer counting up towards an end pointer, basically: while (dstp < dst_end)

Name: Anonymous 2012-08-23 11:33

>>27
The inner loop will be compiled as a pointer counting up towards an end pointer, basically: while (dstp < dst_end)
Only if compiler is very smart. IIRC, Visual Studio 6.0 havent done such optimization and since then I always do it manually.

Name: Anonymous 2012-08-23 12:14

>>10
It can be done by hand rather rapidly using bit operators.

Name: Anonymous 2012-08-23 13:16

>>29
proof?

Name: Anonymous 2012-08-23 17:01

>>28
To my knowledge GCC has featured this optimization since forever

Name: Anonymous 2012-08-23 17:26

>>28              `
>2011
>micro-optimization

ishygddt

Name: Anonymous 2012-08-23 17:41

>>32
What the hell does ``ishygddt" mean, friend?

Name: Anonymous 2012-08-23 17:43

>>32
By the way, why are ``2011" and ``micro-optimization" green? And if you were going to quote something, you do it with "> ", not with ">".

But I doubt you would want to quote ``2011" and ``micro-optimization", >>28-san didn't mention those words.

I'm truly confused.

Name: Anonymous 2012-08-23 17:43

>>33
What the hell does ``ishygddt" mean, friend?
2012
I seriously hope you guys don't do this

Name: Anonymous 2012-08-23 17:45

>>35
No, really, I don't understand and I'd like to know.

Name: Anonymous 2012-08-23 17:53

>>36
Fine. Let me spell it out for you.

ishygddt
I seriously hope you guys don't do this.

Name: Anonymous 2012-08-23 18:00

>>37
I can't believe you're lazy enough to contract that sentence into a badly done ``acronym" that doesn't even use uppercase letters.

And how do you expect me to do a ``2011"?

Name: Anonymous 2012-08-23 18:03

>>38
It's already 2011 and you're still doing this?
ishiggydiggydoo

Name: Anonymous 2012-08-23 19:23

>>39
Shiggy Diggy Doo? Is he related to Scooby and Scrappy?

Name: Anonymous 2012-08-23 20:48

>>39
What?

Please be honest, did this stupid contraction come from Reddit? Never seen it on /prog/ before.

Name: Anonymous 2012-08-23 21:25

Name: Anonymous 2012-08-23 21:28

>>41
It came from the imageboards, namely /v/. /sp/ will claim to have ``started it'' but nobody goes to /sp/ so there's no proof and it would have never reached popularity if it didn't spring up in /v/.

Name: Anonymous 2012-08-23 21:39

>>43
Gosh, so it came from the imageboards.

Why do these people think they're an ``epic meme factory"? This stupid contraction is not even funny.

Name: Anonymous 2012-08-24 0:27

>>44
Because the imageboards are all about xD TUH EPICKQ LOWELEZ, since it's all children and manchildren. However generally their tropes get used here satirically and provocateurally, because we hate most of each other.

Name: Anonymous 2012-08-24 2:57

>>45

nyan

Name: Anonymous 2012-08-24 9:41

Dang, now this thread is ass.

Don't change these.
Name: Email:
Entire Thread Thread List