If your floating point calculations differ after a trivial change

 Delphi  Comments Off on If your floating point calculations differ after a trivial change
Sep 062018
 

Today I had a curious bug: After changing the way GPS coordinates were read from a file (not calculated!) all of a sudden lots of unrelated floating point calculations had different results.

I reverted the changes just to be sure and, yes, the results were back to the original values.

I added the new code again, same problem.

I changed the code to first use the new method for reading the file, followed by the old one, overwriting the data. The problem still occurred.

I reduced the new code to just constructing the object that does the reading. The problem still occurred.

I even removed the object construction. The problem still occurred.

Finally I had a closer look at the code in the unit that contains the object declaration. Nothing obvious, but after about an hour staring at not very complex code it turned out that it used another unit which automatically loads the proj4.dll which we use for converting geographic coordinates. It doesn’t actually call that dll, just loads it. And then it dawned to me: There is this thing called 8087 Control Word which controls how various operations are done, e.g. rounding. So if the dll changes that control word in its initialization, it will change the calculations done in other parts of the program.

To prove this, I added code that reads the 8087 Control Word before and after the dll was loaded. The values were different. So that probably was the culprit.

The SysUtils unit exports a SafeLoadLibrary function. In addition to calling the LoadLibrary Windows API function, it saves the error mode and the 8087 Control Word before loading the dll and restores these values afterwards. I could not use that function because the unit loads the dll from a resource rather than from a file, using my TdzResourceDllLoader (from dzlib). So I added code to safe and restore the control word to that class.

Guess what? The problem went away.

This is the code (copied from SysUtils.SafeLoadLibrary) I used:

var
  FPUControlWord: Word;
begin
  // [...]
  // save the FPU Control Word
  asm
    FNSTCW  FPUControlWord
  end;
  try
    // code to
    // load the dll into memory
    // relocate it
    // call dllmain
  finally
    // restore the FPU Control Word
    asm
      FNCLEX
      FLDCW FPUControlWord
    end;
  end;