Fixed a bug in the sin lut code so that the results are always accurate.
Changed fixsingen tool to generate tables correctly for the new implementation.
Added tool to generate sin tables.
Tests indicate that using a lut is still ~2.1% out from sinf so it's very possible that our sin function is more accurate than the libmath sinf function on the computer I'm testing with. In which case the accuracy results are offset by that amount.
The multi-pass approach means that this test should run unmodified on embedded environments due to lower memory usage, this test also more realistically tests for cache behaviour.
The average error seems to vary between 3-8%, it's possible that setting iter to a higher value (and pass to a lower one) could give a more stable value.
Fixed point seems to be slower overall on x86 (50% of float) with good floating point hardware, however with the caching enabled depending on the program the fixed point implementation may be much faster (as float would be if cached).
With caching enabled there is a massive difference between and iter size below and above 4096, this is because the caching mechanism thrashes above this threshold, it probably makes sense to disable caching for these tests by compiling libfixmath with FIXMATH_NO_CACHE.