>>27
Actually, you can do it in Java and C#.
Java's volatile modifier keyword enforces strict memory ordering, it will end up translating to machine code that uses the target platform's atomic compare-and-swap instructions or memory fence instructions.
.NET has System.Threading.Interlocked class to atomically operate on integral types as well.
Unfortunately, both are kind of heavy weight and don't make use of platform specific optimizations. For example, on x86, memory loads and stores are in-order no matter what and you don't need to atomically synchronize everything, even in multi-threaded applications. You only need to worry about old data in various cores' cache, and making sure the cache is synchronized with the main system RAM.
C++0x and C1x both have new atomic types as part of their standard libraries that let you specify the type of memory ordering you want to use in the operations you perform on them. If you know the assembly language for your target platforms, it's trivial to write these libraries yourself today, without needing to wait for new compilers implementing the new specs. They don't rely on new language features.