/prog/ - Three Big Lies of Software Development

Name: Anonymous 2011-08-03 1:47

(Lie #1) Software is a platform

I blame the universities for this one. Academics like to remove as many variables from a problem as possible and try to solve things under ``ideal'' or completely general conditions. It's like the old physicist jokes that go ``We have made several simplifying assumptions... first, let each horse be a perfect rolling sphere...''

The reality is software is not a platform. You can't idealize the hardware. And the constants in the ``Big-O notation'' that are so often ignored, are often the parts that actually matter in reality (for example, memory performance.) You can't judge code in a vacuum. Hardware impacts data design. Data design impacts code choices. If you forget that, you have something that might work, but you aren't going to know if it's going to work well on the platform you're working with, with the data you actually have.

(Lie #2) Code should be designed around a model of the world

There is no value in code being some kind of model or map of an imaginary world. I don't know why this one is so compelling for some programmers, but it is extremely popular. If there's a rocket in the game project for example, rest assured that there is a ``Rocket'' class (Assuming the code is written in C++ or some other language that emphasizes OOP -- even in C the programmer will often attempt to create a rocket_t and an OO-like interface of free functions) which contains data for exactly one rocket and does ``rockety'' stuff. With no regard at all for what data transformation is really being done, or for the layout of the data. Or for that matter, without the basic understanding that where there's one thing, there's probably more than one.

Though there are a lot of performance penalties for this kind of design, the most significant one is that it doesn't scale. At all. One hundred rockets costs one hundred times as much as one rocket. And it's extremely likely it costs even more than that! Even to a non-programmer, that shouldn't make any sense. Economy of scale. If you have more of something, it should get cheaper, not more expensive. And the way to do that is to design the data properly and group things by similar transformations.

(Lie #3) Code is more important than data

This is the biggest lie of all. Programmers have spent untold billions of man-years writing about code, how to write it faster, better, prettier, etc. and at the end of the day, it's not that significant. Code is ephemeral and has no real intrinsic value. The algorithms certainly do, sure. But the code itself isn't worth all this time (and shelf space! -- have you seen how many books there are on UML diagrams?) The code, the performance and the features hinge on one thing -- the data. Bad data equals slow and crappy application. Writing good software means first and foremost understanding the data.

The above is a repost from Mike Acton from Insomniac Games

Name: Anonymous 2011-08-03 3:00

>>4
Big O notation is just a type of analysis that you can use if you want to, and it picks up important things, and also leaves out things that may be important in another sense, but to take that into consideration would complicate the analysis. The fact that it is an abstraction should be obvious to anyone who understands what it is.
You missed the point. The point was that software as a platform is a lie. You got distracted.

I don't understand the problem with creating a rocket class. Lots of games have random pieces of code that are used by only certain types of things in the game, and oo is a pretty good method for organizing all of that. I mean, where else would it all go, and how would it all get called without using some form of polymorphism?
That's because you're just following what everyone else is doing, because you learned how to program the OOP way. It's not your fault that you don't know any better. It essentially boils down to how all languages implement OOP, class objects are implemented in array-of-structure order which just does not respect CPU cache utilization period. What the original author is pushing for is structure-of-array order where each field is stored in it's own homogeneous array representing all such objects of that type. And this is where OOP breaks down, and really what you're doing is a hybrid of functional and flow-based programming where you compose transformations over sets of data. This approach is known as data-oriented design.

Here's what a rocket implementation in DOD would look like in a stripped down subset of C++. You might see I'm violating principles of encapsulation and what not, but I warn you don't subscribe to such nonsense:



struct rockets

{

   size_t count;

   int *type;

   int* stage;

   float4* position;

   float4* velocity;

   float4* acceleration;

   float4* force;

   float* mass;

   float* fuel;

   mesh_handle* mesh;

   material_handle* material;

   // more fields...



   // ctor/dtor, other functions here



   void update(float time_delta);

}



void rockets::update(float time_delta) {

    for (size_t i = 0; i < r.count; ++i) {

       float4 delta_impulse = multiply(force[i], time_delta);

       delta_impulse = divide(delta_impulse, mass[i]);

       

       float4 accel = add(acceleration[i], delta_impulse);

       acceleration[i] = accel;



       float4 delta_velocity = multiply(accel, time_delta);

       float4 vel = add(velocity[i], delta_velocity);

       velocity[i] = vel;



       float4 delta_position = multiply(vel, time_delta);

       position[i] = add(position[i], delta_position);

    }

}

The above update will often be much faster than the equivalent fine-grained AoS OO method because it doesn't pollute cache lines with with fuel, type, stage, mesh, material, etc. You don't need polymorphism/virtual functions either, you can pull those out of the transformation/update loops and eliminate branching/switch statements/polymorphic dispatch completely by sorting different types of objects into their own sets of arrays and updating each in sequence (or in parallel if you apply task-oriented parallelism strategies). It often results in much simpler code too. How much faster is this approach? Try 10-20x faster. I'm not joking. The biggest bottleneck with modern CPUs and GPUs isn't the instruction pipeline, it's memory latency. A lot of software just causes CPUs to spend most of their time spinning idly while waiting for cache line fetches or flushes.

I use similar code in my own projects, except I'm often dealing with thousands or tens of thousands of actors, and many of my for loops are actually invocations of a parallel_for template function which partitions the work into work units and offloads them to other threads/CPU cores through a task scheduler/thread pool.

Does this just say to use databases effectively? Then that will take some effort to integrate that in the code...both for producing the data and for reading it.
No, it's about understanding how the hardware works with the data and what data you need for each unrolled transformation, and designing your data structures so that you're doing the minimum amount of work necessary on the CPU or GPU to accomplish your task. The code you need will naturally emerge from the design of your data.

Three Big Lies of Software Development

1 Name: Anonymous 2011-08-03 1:47

6 Name: Anonymous 2011-08-03 3:00

Name: Anonymous 2011-08-03 1:47

Name: Anonymous 2011-08-03 3:00