Intel vs AMDTwo similar computers were tested by running the different FFT implementations of Linrad at different sizes. The Pentium IV is the faster computer, but the difference depends strongly on the FFT implementation. The Pentium IV computer was equiped as follows:Motherboard: ASUS P4S533 SIS645, Socket 478, P4, DDR, ATX. Processor: Intel P4 1.8GHz 400MHz 512kB Northwood socket 478. Memory: 1024 MB DIMM PC2700 333 MHz.
The Athlon computer was equiped as follows:
The computers were tested when processing two channels of 96 kHz bandwidth. The numbers given in the tables below is the CPU load as given i percent by the speed test routines that follow the parameter selection screens 1 and 2.
Small FFT with floating point arithmeticsBy selecting a bandwidth for the first FFT of 100Hz with a sin to power 3 window, the fft1 size is set to 4096. The total memory needed to keep the transforms is then 65536 bytes. Everything should be possible to keep within the cache for both computers.The CPU load is as follows: FFT version PentiumiV Athlon Ratio 0 Radix 2 DIF C 7.32 8.67 1.18 1 Radix 2 DIF asm 7.21 8.53 1.18 2 Twin radix 2 DIF asm 6.49 8.62 1.33 3 Radix 4 DIT C 7.03 9.26 1.32 4 Twin radix 4 DIT C 6.41 8.29 1.29 5 Twin radix 4 DIT SIMD 4.35 7.28 1.67The SIMD (Single Instruction Multiple Data) instructions make a significant improvement on the Pentium IV but much less so on the Athlon computer.
Medium size FFT with floating point arithmeticsBy selecting a bandwidth for the first FFT of 25Hz with a sin squared window the fft1 size is set to 16384. The total memory needed to keep the transforms is then 262144 bytes. The Pentium IV has a big enough cache but the Athlon suffers from having only 256kB of cache. There are sine/cosine tables and a few other things for the processor to keep in the cache besides the fft data. It is interesting to note that the twin routines are faster than running single routines twice despite the fact that the single routines only have to keep 131042 bytes of transform data.The CPU load is as follows: FFT version PentiumiV Athlon Ratio 0 Radix 2 DIF C 12.33 17.44 1.41 1 Radix 2 DIF asm 12.09 17.23 1.43 2 Twin radix 2 DIF asm 11.53 14.78 1.28 3 Radix 4 DIT C 9.95 19.70 1.98 4 Twin radix 4 DIT C 8.43 13.96 1.66 5 Twin radix 4 DIT SIMD 5.70 13.09 2.30The SIMD (Single Instruction Multiple Data) instructions are efficient on the Pentium IV, they make the radix 4 decimation in time run 32% faster. On the Athlon the SIMD instructions only improve by 6%. For the medium size floating point FFTs the Pentium IV runs significantly faster. By a factor of 2.3.
Integer arithmetics using MMX instructionsThe second FFT version 2, twin radix 4 DIT was run at some different sizes with sine squared windows on the two computers.Here are the results: FFT size Memory PentiumiV Athlon Ratio 32768 262144 6.17 7.6 1.23 65536 524288 7.43 8.6 1.16 262144 2097152 12.7 13.6 1.07For very large transform sizes the cache size is not important. The Athlon is nearly as fast as the Pentium IV.
