I'm writing some program (C++) which will heavily depend on bitmaps... and it needs to be ultrafast.
I thought of implementing:
- 2d arrays: bitmap[y][x]=0xdeadbeef;
- 1d arrays: bitmap[y_+x]=0xdeadbeef; where y_ = y*w, recalculated as rarely as possible
- vectors.
The point is it needs to be fast. I will often need to traverse every bitmap's cell in usual order (y=0..h, x=0..w). Having that in mind, which of the options above is the best?
Look at what your compiler does with various optimization settings. If it does poorly, use pointers.
Name:
Anonymous2010-01-20 7:50
Completely dependent on compiler. If it needs speed, just make a quick ASM implementation, however the compiler probably knows ASM far better than you. The simplest solution is always get a better CPU.
Don't optimize until you've shown there's a problem.
Name:
Anonymous2010-01-20 10:30
>>1
How does C++ internally handle multidimensional arrays? What I mean is, with a 1D array that you want to mimic a 2D array you merely have a row and a column index and do the following:
array[(row * rowlength) + column]
With a 2D array, you access the array by distributing the index ( array[row][column] ). Does the structure of a 2D array in memory look different than the structure of a 1D array?
Does the structure of a 2D array in memory look different than the structure of a 1D array?
No. The blocks are contiguous regardless. A two dimensional array is basically syntactic sugar.
mingw produces exactly the same code on win32 for 2d and 1d arrays.
Done some tests and implemented four methods of walking through 2d arrays:
1. foreach y { foreach x { arr[y][x] = x + y } }
2. foreach y, y2 += w each step { foreach x { arr[y2+x] = x + y } }
3. pointer = arr; foreach y { foreach x { *pointer = x + y, ++pointer } }
4. y = 0, x = 0, l=x*y, for i = 0 to l { x ++; if ( x == w ) { y ++; x = 0; } * pointer = x + y; pointer ++; }
...and every of above runs for about 1-2 sec (this is random each run) so apparently the style is not that important..
just thought of runtime code assembling, create var with opcodes containing some instructions repeated x*y times - w/o any comparison instructions... then properly change eip to &var.. heh. useless though.
Name:
Anonymous2010-01-25 3:21
Well, if you're going to loop trough pixels, USE A 1D ARRAY.
'cause you can loop trough it linearly, at the cost of a counter increment. For a 2d array, you're also paying for a multiply (if you are using 2d indexing that is).
But if speed is really that important, you might want to check out sse2 and mmx, or even interlace both. (2*sse 1*mmx) to get a fuckton of speed out of your operations.
Name:
Anonymous2010-01-25 4:18
>>11
One multiplication every row is not going to be a speed issue.
Name:
Anonymous2010-01-25 4:24
>>10
That's what I did for this obscure data mining project I was working on several months ago. Sped up a critical algorithm's inner loop by around 700%.
Name:
Anonymous2010-01-25 9:45
>>9
Stagnate the state of the art? Accessing elements of an array is state of the art?
>>1
Option #1 is a filing cabinet.
Option #2 is a chessboard.
Go for #2.
>>7
Not necessarily. Each row of the array must be a contiguous location in memory, but the rows do not necessarily have to be contiguous.
Name:
Anonymous2010-01-27 14:31
>>13
Which would, in real situations be slower than the original solution.
Let's analyze your situation:
Generate the assembly code - takes quite a bit of time
Assembly code per cell is probably bigger than the original data -does not only take a lot of bandwidth, but also a LOT of time due to cache trashing etc.
Guess the good ol' for loop using SSE and MMX is faster.