You could use:
1. One thread for accepting new connections and placing them on a connection/request queue.
2. One thread to pop connections from the above queue and assign them to worker threads, if available
3. N worker threads that process data and when I/O or CGI scripting is needed they end their work and put the request again in the same request pile, this time with status information that specifies where work stopped before.
IOCP is a good model, you can check it out here:
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365198%28v=vs.85%29.aspx
Though your implementations won't be as efficient as the M$ IOCP one for some kernel optimizations, Linux is lacking on this area...
And for CGI you may want to check out the FastCGI solution.