Studies of the input hardware delay in SDR systems.
(Oct 10 2011)

Minimum Delay in Linrad.

With Linrad used for SDR processing one can set parameters for a total delay of 7 ms from antenna to loudspeaker. For details look here.

By replacing the input side with other hardware it is possible to determine the delay caused by the drive routines of different SDR hardware.

Input Delay under Windows XP (32 bit) on a fast computer.

These tests were performed on a D5400XS with 2 Xeon E5410 processors. 8 CPU cores in total. The operating system was Windows XP SP3 updated with Microsoft update to the latest state by Sept 25 2011.


Input hardware                  Output soundcard  Delay          Comments
                                                   (ms)
Softrock+Delta 44 MME               Delta 44        7         Frequent glitches in and out.
Softrock+Delta 44 MME               Lynx Two        8         Infrequent glitches in.
Softrock+Lynx Two MME               Delta 44        7         Infrequent glitches out.
Softrock+Lynx Two MME               Lynx Two       10.5
Softrock+SB Live PCI MME            Lynx Two       35
Softrock+Intel HDA MME              Lynx Two       35
Softrock+SB Live USB MME            Lynx Two       23
AFEDRI SDR MME                      Lynx Two       35
Perseus at 125 kHz                  Lynx Two       14
Perseus at 500 kHz                  Lynx Two       14
SDR-14 at 95238 kHz                 Lynx Two       52
SDR-IP at 125 kHz small UDP         Lynx Two      128
SDR-IP at 125 kHz large UDP         Lynx Two       70
SDR-IP at 500 kHz large UDP         Lynx Two       17
Excalibur 125 kHz                   Lynx Two       35
Excalibur 500 kHz                   Lynx Two       33
Table 1. Observed delays with Windows XP on a fast computer.


It is desireable to have a total delay of 40 ms or less in an SDR receiver that is used for Morse telegrapy where the operator wants to listen between dots and dashes (QSK). High speed operators may require even shorter delay.

When Linrad is used as a transceiver the delay from keydown to loudspeaker output has to be below 40 ms. That means that ordinary soundcards with 35 ms delay are useless. The input hardware currently available to me and fast enough are Delta 44, Lynx Two, Perseus and SDR-IP (at high sampling rate.)

It is possible that some configuration can be changed that makes some of the input hardware faster. The investigation does however show that there are several alternatives with adequate performance for a SDR transceiver.

Input Delay under Windows 7 (32 bit) on a fast computer.

The computer used was the same D5400XS which was used for the test with Windows XP above. For unknown reasons Windows 7 slows down the Linrad thread for narrowband processing. To evaluate the I/O hardware it is necessary to reduce the baseband sampling speed from 48 kHz to 24 kHz. That is done with the First mixer bandwidth reduction parameter which is set to 2 for 96 kHz input in the Win 7 tests. The baseband thread only uses 10% of one CPU, but trying 48 kHz baseband sampling speed leads to occurances of data piling up in the processing buffers which would have to be managed by a large Output delay margin. The baseband thread must be waiting somewhere. It is not on a Sleep statement. They all sleep the requested time as one would expect from a system having 8 cpu cores. There should always be an idle core that can resume work at the end of a wait period. The problem should be avoidable, maybe finding the cause would improve under other operating systems also.

The processing parameters that affect the delay were set like this for the tests under Windows 7 on a D5400XS computer with 8 Xeon E5410 cores:

First FFT bandwidth = 2000
First FFT window = 9
First mixer bandwidth reduction in powers of two = 2 (for 96 kHz input)
Third FFT window = 8
Min DMA rate = 2000
Max DMA rate = 2000
Third FFT N = 5 (size = 32)
Priority = 2
Timer resolution = 1

In all cases the Delta 44 was used for output at 96 kHz with the ASIO driver. Table 2 shows the results.


Input hardware              Output        Delay      Comments
                            margin        (ms)
Softrock+Delta 44 WDM-KS      2            7       Insignificant glitches.
Softrock+Delta 44 MME        15           30       Insignificant glitches
Excalibur 100 kHz            10           35 
SDR-IP 100 kHz                6           14
Perseus 96 kHz                6           10
Softrock+SB Live USB WDM-KS   6           20
Softrock+HDA Intel WASAPI     6           14
Softrock+HDA Intel MME       15           30
Table 2. Observed delays with Windows 7 on a fast computer.


The delay margins in table 2 is the margins required to avoid underrun errors. With a more clever algorithm for setting the resampling ratio required to match the output speed to the input speed it would be possible to reduce this value in some cases and to get up to about 5 ms faster response.

Linrad can not (yet?) open ASIO devices for RDWR. Only one ASIO device can be present so it has to be either input or output.

Under Windows 7 the MME routines are slow. Soundcards that have WASAPI or WDM-KS drivers should be used whenever a soundcard is used for input. Best seems to be to use ASIO for Output.

It is interesting to note the dramatic difference in network latency between Windows XP and Windows 7. The SDR-IP needs Windows 7.

Input Delay under Linux (32 bit) on a fast computer.

The computer used was the same D5400XS which was used for the test with Windows XP above. The Linux version used for these tests was Debian sid with the kernel 3.0.0. The soundcard for the output was a Delta 44 in all cases and all tests were made without Portaudio under X11. Table 3 shows the results.


Input hardware                  Delay    
                                 (ms)
Softrock+Delta 44                 12
Softrock+Ensoniq Audio PCI        30
SDR-IP 100 kHz                    16
Perseus 96 kHz                    16
Softrock+Ensoniq PCI ES1371       30
Softrock+SB Live Value PCI        14
Softrock+SB Live USB              12
Table 3. Observed delays with Linux on a fast computer.


Windows XP on a slow computer.

The computer used for this test was a 650 MHz Pentium III. The processing parameters must be set for modest CPU load and they are quite different from the parameters that can be used on a fast computer. Larger buffers increase the delay. The output delay margin has to be set larger to accomodate the actual processing time. The single CPU can not do several tasks simultaneously so the assumption that a buffer is processed with out any delay as soon as it is filled is not true at all. (On the 8 core computer one can assume that a buffer is processed in negligible time as soon as it is filled.)

The following parameters were used on the 650 MHz Pentium III:

First FFT bandwidth = 200
First FFT window = 9
First mixer bandwidth reduction in powers of two = 2(4)
Third FFT window = 3
Min DMA rate = 200 (100)
Max DMA rate = 200 (100)
Third FFT N = 5 (size = 32)
Output sampling rate 24000 (96000)
Priority = 2
Timer resolution = 1
ASIO buffer size 64 (256)

The numbers between brackets are for Delta 44 used for input as well as for output. Running the output at 96 kHz is a waste of resources on a slow computer but it is necessary because with a Delta 44 the input and the output must run at the same rate. All tests are with the Delta 44 ASIO for output and the results are listed in table 4. The tests are with 2 RF channels for the Delta 44 and with 96 kHz or 100 kHz for the input sampling rate where possible. Cards that do not support more than 44100 or 48 kHz are run at their maximum speed and the first mixer bandwidth reduction is reduced for the same baseband sampling speed as with the other input hardware.


Input hardware                  Delay    
                                 (ms)
Softrock+Delta 44 MME             55
Softrock+SB 16 ISA MME            55
Softrock+SB 16 ISA WDM-KS         55
Softrock+SB Live! Value           60
SDR-IP                            45
Perseus 96 kHz                    55
Excalibur 100 kHz                180       
Softrock+SB Live USB              60
Table 4. Observed delays Windows XP on a 650 MHz Pentium III with Linrad-03.26.



With other soundcards/drivers than Delta 44/ASIO it is necessary to set a much larger timer resolution like or more. Linrad-03.27 has a more precise control of the time when the output starts which allows slightly faster response.

Windows 7 on a slow computer.

It is unreasonable to pay Microsoft for a licence just to find out whether Windows 7 can be used on a slow computer.

The evaluation copy has expired so Windows update does not allow upgrades. When trying to run Linrad with the old evaluation installation for which the most recent update was October 2010 the screen looks like figure 1.

Fig. 1Windows 7, an old evaluation copy, does not work on a Pentium III. It seems to me that the high CPU usage by audiodg.exe is a bug. Maybe Windows Update would provide a correction if I would buy a license for the old Pentium III, maybe not.

Linrad-03.26 under Linux on a slow computer.

There are two ways to run Linrad under Linux. One can use svgalib or one can use X11. Unfortunately svgalib is becoming difficult to use because it is no longer maintained. In the past there was a big performance difference between svgalib and X11 as shown here: A CW receiver with a small time delay and a fast waterfall graph That was in 2006 with an early version of Debian etch.

The slow computer is the same Pentium III at 650 MHz which was used for Microsoft Windows above. All tests in this section are with an Ensoniq ES1371 for input and Delta 44 for output using native ALSA and the following parameters:

First FFT bandwidth = 200
First FFT window = 9
First mixer bandwidth reduction in powers of two = 1
Third FFT window = 3
Min DMA rate = 200
Max DMA rate = 200
Third FFT N = 6 (size = 64)
Output sampling rate 24000
Priority = 1

Three generations of Linux were tested:

Debian lenny. Kernel 2.6.26, February 2009.
Debian squeeze. Kernel 2.6.32, February 2011.
Ubuntu 11.10. Kernel 3.0.0, October 2011.

There is no longer a tool to set the colour depth in X11. One has to edit /etc/X11/xorg.conf and add a line DefaultDepth 8 to set a colour depth of 8 like this:

Section "Screen"
        Identifier      "Default Screen"
        Monitor         "Configured Monitor"
        DefaultDepth 8
EndSection
Table 5 shows the time delay necessary to get an output that is free of glitches when Linrad and top are the only programs running. The table also shows the CPU load from Linrad and Xorg as well as the idle time. The graphics is an important factor. All the entries in table 5 are with the same set of parameters for spectrum averages and everything else except for the output delay margin (ODM) which is adjusted in each case to avoid output underrun errors.


                                  CPU percentages      Delay    ODM  
Distribution      Graphics         Linrad  Xorg   Idle      (ms)    (ms)
Debian lenny      svgalib            50     -      50        50       8
Debian squeeze    svgalib            50     -      50        50       8
Debian lenny      8 bit X11-SHM      50     5      45       115      80
Debian squeeze    8 bit X11-SHM      50     5      45       140     100
Debian lenny      8 bit X11          55     5      40       115      80 
Debian squeeze    8 bit X11          53     7      40       140     100 
Debian lenny      24 bit X11-SHM     50    20      30       150     120
Debian squeeze    24 bit X11-SHM     52    16      25       210     100
Ubuntu 11.10      24 bit X11-SHM     40    30      30       140      80
Debian lenny      24 bit X11         57    20      22       210     180
Debian squeeze    24 bit X11         59    23      17       240     200
Ubuntu 11.10      24 bit X11         47    39       8       180     140
Table 5. Observed delays in different Linux distributions on a 650 MHz Pentium III running Linrad-03.26


Table 5 shows that svgalib is much better than X11. That is no joy for the future because svgalib can not be used with modern kernels. Maybe one could set specific boot options to make svgalib work under i.e. Ubuntu 11.10, but that would require Linux skils above normal. I have no idea how to do that - and presumably it depends on what hardware one is using.

I have not been able to find out how to run Ubuntu 11.10 with 8 bit graphics. The reason why more modern Linux distributions show worse performance is that they have more programs running in the background. Those programs occasionally block the CPU and make it necessary to set a large output delay margin.

Priority in Linrad. A new strategy for linrad-03.27 under Linux.

Under Windows the entire Linrad program gets elevated priority. Under Linux however Linrad sets elevated priority to the input and output threads only. That is the reason why the user has to set an output delay margin that is large enough to accomodate for the CPU time used by other processes on a system with a single CPU core.

Linrad-03.27 sets elevated priority to all the processing threads. That leads to a significantly improved performance on the Pentium III computer as can be seen in table 6.


                                  CPU percentages      Delay    ODM  
Distribution      Graphics         Linrad  Xorg   Idle      (ms)    (ms)
Debian lenny      svgalib            50     -      50        42       8
Debian squeeze    svgalib            50     -      50        42       8
Debian lenny      8 bit X11-SHM      49     5      43        60      30
Debian squeeze    8 bit X11-SHM      50     4      46        50      20
Debian lenny      8 bit X11          53     5      42        62      35 
Debian squeeze    8 bit X11          52     5      42        50      30 
Debian lenny      24 bit X11-SHM     51    15      33        75      50
Debian squeeze    24 bit X11-SHM     50    15      30        50      20
Ubuntu 11.10      24 bit X11-SHM     42    32      22        38      15
Debian lenny 24 bit X11 53 20 24 75 50 Debian squeeze 24 bit X11 57 18 22 50 20 Ubuntu 11.10 24 bit X11 49 39 9 38 15 Table 6. Observed delays in different Linux distributions on a 650 MHz Pentium III running Linrad-03.27


Setting an elevated priority on all processing threads only works properly on recent Linux kernels. There is a significant difference between Debian lenny (2.6.26) and Debian sid (3.0.0) By setting very demanding parameters one can see the difference. Under lenny the mouse can become sluggish and there can be overruns and underruns despite the fact that no thread is anywhere near 100% CPU load. It seems threads have to wait for a particular CPU core to become available. Those problems do not occur uner sid.

Linrad-03.27 is less sensitive to the limitations in lenny. The problem is the timing of X11 and by never updating the fft1 and fft3 spectra more often than every 50 ms linrad-03.27 runs reasonably well under lenny with obscenly agressive parameters. An example is shown in figure 2.

Fig. 2Debian lenny with extreme settings in Linrad for high CPU load.

The CPU load values in top range from 250 to about 25 for Xorg and from 150 to 50 for xlinrad. Other processes also show high values occasionally. The entire linrad directory was copied into a Debian sid installation. The same xlinrad executable was then run under sid with all parameters identical to those used for figure 2. The result is shown in figure 3.

Fig. 3Debian sid with extreme settings in Linrad for high CPU load.

The CPU loads in top are stable under sid. Linrad shows 82% of one CPU which is 10.25% of eight CPUs in good agreement with the load reported by Linrad. Xorg is at 7%, one order of magnitude lower in sid compared to lenny.

Really old Linux a slow computer.

A test with linrad-03.26 under Red Hat 5.1 (kernel 2.2.12) using svgalib and 4Front OSS 3.9.8g looks good on the Linrad screen with a computed delay of 45 ms. That is with Ensoniq ES1371 for input and Delta 44 for output. The real delay is however 130 ms. There has to be a large buffer hidden somewhere in the drivers of this old version of 4Front OSS.

Conclusions.

Modern computers with modern operating systems can be run with small dealys in graphical environments when this is written in the year 2011. Five years ago this was not true. In the year 2006 one had to use svgalib in terminal mode to get a small delay from antenna to loudspeaker in Linrad. Ten years ago svgalib in terminal mode was also slow.

Soundcard input Device drivers are different. Some allow small buffers, others use hidden buffers on the input. With some soundcards the input delay can be set to 2 ms or less. Windows 7 is faster than Windows XP but old soundcards may not work correctly (Delta 44). With Windows a soundcard with ASIO, WASAPI or WDM-KS drivers is more likely to be fast than a card that only has MME drivers.

Network input The input delay is about 10 ms under Windows 7 and Linux. Windows XP adds a very large delay at low sampling rates.

USB input. Some hardware use fixed buffer sizes for the USB, others have a minimum size that can be set for the USB interface. USB in itself is fast. Perseus and SB Live! external USB with WDM-KS give an input delay well below 10 ms. The input delay for Excalibur is about 30 ms and for the SDR-14 it is about 45 ms and proportionally less at higher speeds.

To SM 5 BSZ Main Page