>>1
Each type of operating system has its own APIs for multithreading. However, there's a lot of theory you need to understand that is transferable across OSes.
The new C11 revision to the standard has builtin multithreading libraries, but I'm not sure if GCC or Clang have that stuff implemented yet.
It shouldn't be that confusing, it's actually fairly intuitive. The best way to think about your multi-threaded application is if it's a networked client-server or peer-to-peer architecture, where different threads are like different peers. The only difference is that each "peer" has access to the same shared memory, and exists in the same addressing space--threads can access the same memory location simultaneously and therefore need to be appropriately synchronized so that the program as a whole doesn't enter into a bad state. It's actually pretty simple if you follow the basic idioms of using mutexes for synchronizing access to a memory location, and condition variables/events for signalling between threads.
The tricky part is when you attempt to build custom synchronization primitives using the atomic operations and memory fence instructions provided by your target CPU architecture, but you can always get away with the toolkit of synchronization primitives provided to you by whatever API you happen to be using.