TStringList vs. THashedStringList vs. TDictionary

 Delphi  Comments Off on TStringList vs. THashedStringList vs. TDictionary
Feb 162020
 

Prompted by the topic Dictionaries, Hashing and Performance in the international Delphi Praxis forum I did some timing to compare the performance of data structures in the Delphi runtime library that can be used to store data indexed by strings:

  • a sorted, case sensitive TStringList (available since Delphi 6)
  • a sorted, case sensitive THashedStringList (available since Delphi 6)
  • a TDictionary<string,Integer> (available since a Delphi 2009)

Just in case you did not know about THashedStringList: It is a TStringList descendant declared in System.IniFiles. It’s used to speed up access to TMemIniFile. (EDIT: As Uwe Raabe pointet out, that’s no longer true. As of Delphi 10.3 (and possibly earlier, I haven’t checked) TMemIniFile also uses TDictionary<string,Integer>.)

The test adds 676 strings (‘AA’ .. ‘ZZ’) to each structure and does that 10000 times (which means that there are quite a few checks for duplicates to be done on adding). Then – again 10000 times – it does a lookup for each of these strings.

Of course that is just a simple test and it is neither a large number of entries nor long strings. I just wanted to get a feel for the performance of these structures.

Here is the main code for TStringList and THashedStringList:

procedure Tf_HashedStringListTest.DoTiming(sl: TStringList);
const
  CYCLES = 10000;
var
  k: integer;
  i: integer;
  j: integer;
  sw: TStopwatch;
  s: string;
  Idx: Integer;
begin
  sl.Sorted := True;
  sl.CaseSensitive := True;
  sl.Duplicates := dupError;
  sw := TStopwatch.StartNew;
  sl.BeginUpdate;
  for k := 1 to CYCLES do begin
    for i := Ord('A') to Ord('Z') do begin
      for j := Ord('A') to Ord('Z') do begin
        s := chr(i) + chr(j);
        if not sl.Find(s, Idx) then
          sl.AddObject(s, Pointer(i * 100 + j));
      end;
    end;
  end;
  sl.EndUpdate;
  sw.Stop;
  m_Output.Lines.Add(sl.Count.ToString + ': Add: ' + sw.Elapsed.ToString);

  sw.Reset;
  sw.Start;
  for k := 1 to CYCLES do begin
    for i := Ord('A') to Ord('Z') do begin
      for j := Ord('A') to Ord('Z') do begin
        s := chr(i) + chr(j);
        sl.IndexOf(s);
      end;
    end;
  end;
  m_Output.Lines.Add(sl.Count.ToString + ': IndexOf: ' + sw.Elapsed.ToString);
end;

And very similar for TDictionary:

procedure Tf_HashedStringListTest.DoTiming(sl: TDictionary<string, integer>);
const
  CYCLES = 10000;
var
  k: integer;
  i: integer;
  j: integer;
  sw: TStopwatch;
  s: string;
  v: integer;
begin
  sw := TStopwatch.StartNew;
  for k := 1 to CYCLES do begin
    for i := Ord('A') to Ord('Z') do begin
      for j := Ord('A') to Ord('Z') do begin
        s := chr(i) + chr(j);
        if not sl.TryGetValue(s, v) then
          sl.Add(s, i * 100 + j);
      end;
    end;
  end;
  sw.Stop;
  m_Output.Lines.Add(sl.Count.ToString + ': Add: ' + sw.Elapsed.ToString);

  sw.Reset;
  sw.Start;
  for k := 1 to CYCLES do begin
    for i := Ord('A') to Ord('Z') do begin
      for j := Ord('A') to Ord('Z') do begin
        s := chr(i) + chr(j);
        sl.Items[s];
      end;
    end;
  end;
  m_Output.Lines.Add(sl.Count.ToString + ': IndexOf: ' + sw.Elapsed.ToString);
end;

The result is not really surprising:

TDictionary is the winner by a large margin, followed by THashedStringList and then TStringList. The two string lists only differ in the IndexOf times, the adding times are very similar.

On my computer, with an AMD Phenom II XE 1090T processor, and compiled with Delphi 10.3 I get the following times:

Structure Time for Add [sec] Time for IndexOf [sec]
TStringList 7.43 7.48
THashedStringList 7.45 4.40
TDictionary 1.05 1.04

EDIT: I just found that changing the code for the THashedStringlist from using Find to using IndexOf reduced the time for adding entries to about the same time as for IndexOf. So both are about 4 seconds. This makes me wonder whether there is a bug in THashedStringList because it does not override AddObject. It simply inherits it from TStringList which for sorted lists calls Find to see if the string is already in the list. In contrast to IndexOf the Find method does not use the hash so it’s as slow as in TStringList. But maybe that is on purpose because the hashes get calculated rather frequently for all entries. I get the impression that THashedStringList is not really well implemented and nobody noticed because it was just good enough.

EDIT2: As it was only used in TMemIniFile to get fast access to the entries without needing them to be sorted, the implementation probably was good enough. My test above doesn’t check the performance of Add for an unsorted THashedStringList which is what TMemIniFile used.

If you like, you can download the full source code of my test program.

 Posted by on 2020-02-16 at 16:05