Calculating Offsets into the Delphi editor buffer

I have already mentioned the AutoTodo wizard for Delphi when I was trying to contact the Author Peter Laman. He hasn’t responded and nobody could give me any contact information. (Peter, if you ever read this, please contact me using my Google+ profile.)

The animated GIF in that post shows how the new AutoTodo expert in GExperts works. Unfortunately it turned out later that, again, I had overlooked a Unicode issue. If the editor buffer contains any Unicode characters, the offsets for inserting the todos where off by one for each of these characters, so the todos were inserted in the middle of the source code rather than at the empty blocks.

unit bla;
// ä <--- beware, there's a Unicode character here!
interface

implementation

procedure blub;
begi  //TODO 5 -otwm -cEmpty Structure : blub (begin/end in procedure)n
end;

end.

The reason, of course is that starting with Delphi 8 the IDE uses UTF-8 for its editor buffers, so any offsets into these buffers have to take characters into account that take up more than one byte.

The easiest way to do that is not using offsets at all but Line/CharIndex positions as stored in the TOTACharPos record. IOTAEditView provides two methods for converting a TOTACharPos to a buffer offset and vice versa:

type
  IOTAEditView40 = interface(IInterface)
    // ...
    { Converts a linear buffer offset position to a CharPos }
    function PosToCharPos(Pos: Longint): TOTACharPos;
    { Convert a CharPos to a linear buffer offset }
    function CharPosToPos(CharPos: TOTACharPos): Longint;
    // ...
  end;

So, if you want to store any positions, never use the buffer offset but use the CharPos instead.

But what if your algorithm only works with offsets? And if these offsets are not into UTF-8 strings but into Unicode strings? You need a way to calculate the CharPos from your offset and then use CharPosToPos to calculate the buffer offset.

In this case, the algorithm of the AutoTodo wizard which I used as a base for the GExperts AutoTodo Expert generated a TStringList with text to insert into the source code where the Objects[] property stored the character index into the source code string:

  // get the source code from the current editor window into the
  // string Source and pass it to the AutoTodo handler:
  Patches := TStringList.Create;
  Handler.Execute(Source, Patches);
  // and now what?

I am a lazy basterd™, so the first thing I looked for was some existing source code for converting the character index to a line index / character position. I found nothing in the TStringList interface and nothing in the Delphi RTL. A Google search didn’t give me any useful results (I might have used the wrong search terms.). Even the Google+ Delphi Developer community refused to support my lazyness by pointing me to a ready made algorithm. So I had to roll my own.

This class takes a StringList (TGXUnicodeStringList is just a regular StringList for most purposes) with a multi line string and calculates the offsets for the first characters of all lines. These offsets are stored in the FOffsets array. After this is done, it can easily and reasonably efficient calculate the line index and the character position in that line from the character position in the multi line string stored in the StringList.

type
  TOffsetToCursorPos = class
  private
    FOffsets: array of integer;
  public
    constructor Create(_sl: TGXUnicodeStringList);
    function CalcCursorPos(_Offset: integer): TPoint;
  end;


{ TOffsetToCursorPos }

constructor TOffsetToCursorPos.Create(_sl: TGXUnicodeStringList);
var
  cnt: integer;
  i: Integer;
  CrLfLen: integer;
  Ofs: Integer;
begin
  inherited Create;
{$IFDEF GX_VER190_up}
  CrLfLen := Length(_sl.LineBreak);
{$ELSE}
  // Delphi < 2007 does not have the LineBreak property
  CrLfLen := 2;
{$ENDIF}
  cnt := _sl.Count;
  SetLength(FOffsets, cnt);
  Ofs := 1;
  for i := 0 to _sl.Count - 1 do begin
    FOffsets[i] := Ofs;
    Inc(Ofs, Length(_sl[i]) + CrLfLen);
  end;
end;

function TOffsetToCursorPos.CalcCursorPos(_Offset: integer): TPoint;
var
  i: integer;
begin
  i := 0;
  while (i < Length(FOffsets)) and (_Offset >= FOffsets[i]) do begin
    Inc(i);
  end;
  Result.Y := i - 1;
  Result.X := _Offset - FOffsets[Result.Y] + 1;
end;

Not too complicated, but let my tell you: It took me quite a while to get it right and make it compile with all affected Delphi versions.

The while loop in CalcCursorPos could probably be replaced with a binary search because the FOffsets array by definition is sorted.

Now, all I had to do was passing the offsets from the patch array to CalcCursorPos and then use CharPosToPos to calculate the buffer offset.

Easy, isn’t it?