Skipping the UTF-8 BOM with TMemIniFile in Delphi 2007

Recently I came across a problem with INI files: Some editors (including recent versions of Windows Notepad) add a byte order mark (BOM) to the files they save. In particular the BOM for UTF-8 kept appearing in INI files which then were read incorrectly by the Delphi 2007 implementation of TMemIniFile (I guess the same applies to all pre Unicode versions of Delphi). In particular this was a problem with programs that used TJvAppIniStorage for streaming application settings to disk. (TJvAppIniStorage internally uses TMemIniFile.) So I tried to fix this, first by adding code that reads that file, removes the BOM and writes it back, before actually using it. This had some unpleasant side effects because some programs that usually start at the same time tried to access the file in parallel and failed. (No problem when only reading, but a big problem when writing.)

So I dug deeper and found that modifying TMemIniFile.LoadValues like this fixed the problem:

procedure TMemIniFile.LoadValues;
const
  BOM_LENGTH = 3;
var
  List: TStringList;
  st: TMemoryStream;
  Buffer: array[0..BOM_LENGTH-1] of Byte;
begin
  if (FileName <> '') and FileExists(FileName) then
  begin
    List := nil;
{$MESSAGE hint 'UTF-8 fix for TMemIniFile.LoadValues is active'}
    st := TMemoryStream.Create;
    try
      st.LoadFromFile(FileName);
      st.Position := 0;
      if BOM_LENGTH = st.Read(Buffer, BOM_LENGTH) then begin
        // the file contains at least BOM_LENGTH bytes
        if (Buffer[0] = $EF) and (Buffer[1] = $BB) and (Buffer[2] = $BF) then begin
          // we have a BOM -> Just leave the stream position as it is
        end else begin
          // no BOM -> reset stream position
          st.Position := 0;
        end;
      end;

      List := TStringList.Create;
      List.LoadFromStream(st);
      SetStrings(List);
    finally
      List.Free;
      st.Free;
    end;
  end
  else
    Clear;
end;

Note that this will only skip the BOM for UTF-8, but that is the only case I have ever encountered, because UTF-8 is an encoding that is mostly compatible with ANSI encoding. Other encodings will break TMemIniFile completely. But even with UTF-8 you will still encounter problems with characters that are encoded with more than one byte. So this is more of a simple workaround than a bugfix. For a bugfix, you will have to properly decode the whole file. (Or use a Unicode aware version of Delphi where this problem doesn’t exist.)

Of course TMemIniFile is declared in the RTL unit IniFiles so modifying it is not something to do on a whim. It turned out that at least in my case there was no problem as apparently there are no other RTL units that needed to be recompiled to include the changed IniFiles unit. So the easiest way was to copy IniFiles.pas to my program’s source directory, add it to the project (I prefer doing that so it’s easier to spot such a modified unit.) and recompile.