optimize ntohl etc. in terms of bswap functions
we can do this without violating the namespace now that they are
macros/inline functions rather than extern functions. the motivation
is that gcc was generating giant, slow, horrible code for the old
functions, and now generates a single byte-swapping instruction.