Tuesday, 16 September 2008

Speed Benefits of Using The Native Win32 String Type

Filed under: Programming — Jan Goyvaerts @ 17:40

Some Delphi developers seem to be concerned that moving their applications to Delphi 2009 and Unicode will make them slower. These fears are mostly ungrounded. In fact, switching to UnicodeString may actually make your application faster.

The source of the concern is that UnicodeString uses twice as much memory for plain ASCII text than AnsiString does. That is true. But with even the cheapest PCs shipping with 512MB of RAM, you need to be keeping a whole lot of text in memory for this to have a significant impact. And if memory is at a premium, the AnsiString type is still available in Delphi 2009 if you need it. And UTF8String got a whole new meaning.

When manipulating strings holding ASCII text in memory, the CPU needs to move around twice as many bytes for UnicodeString than for AnsiString. A synthetic benchmark will show that this takes twice as long. But memory copies are very fast. In real world applications, memory copies for string manipulation are responsible for only a very small percentage of the CPU usage, because they’re so fast to begin with. Making them twice as slow isn’t going to hurt the overall performance of your application.

Much of your application’s real work will involve Win32 API calls, some direcly, but most through Delphi’s RTL and VCL. Even a function like AnsiLowerCase() is really just a wrapper around the Win32 API call CharLowerBuff().

UnicodeString uses the UTF-16 encoding. In Windows 2000 and later, this is the native encoding of the Win32 API. Sure, most calls have A and W versions, such as CharLowerBuffA() and CharLowerBuffW(). In Delphi 2007, CharLowerBuff() maps to CharLowerBuffA(). In Delphi 2009, it maps to CharLowerBuffW().

AnsiLowerCase() takes one parameter declared as “string”. That means that it effectively takes an AnsiString in Delphi 2007, and a UnicodeString in Delphi 2009. AnsiLowerCase() is indeed the proper call to make to convert a UnicodeString into lowercase. Plain old LowerCase() only works on ASCII characters in Delphi 2009, as it always has. Though this may seem confused, it means your applications will continue to work the way they did before when you move to Delphi 2009, and “string” suddenly becomes a UTF-16 creature.

Under the hood, however, AnsiLowerCase() works quite differently now. And I mean deep under the hood, inside the Win32 API. Delphi 2009 calls CharLowerBuffW(), which directly converts the UTF-16 string to lowercase. Delphi 2007 calls CharLowerBuffA(), which needs to deal with a string in one of the many legacy code pages supported by Windows. Clearly, Microsoft can’t provide an implementation for every possible code page for every Win32 “A” function. Instead, CharLowerBuffA() calls MultiByteToWideChar() to convert your string to UTF-16, then it calls CharLowerBuffW() to do the conversion, and then it calls WideCharToMultiByte() to convert the resulting string back to the code page your application (actually: the calling thread) is working with.

This is where the very real speed benefit of moving to UnicodeString comes from: no more round trip Ansi< ->Wide conversions. These conversions happen with the “A” version of all Win32 API calls that take a string parameter and/or return a string.

You’ll have to run your own benchmarks on your own applications. But I strongly doubt that the typical Delphi application is going to see a performance decrease. Memory usage will go up a bit, but speed will be flat or slightly faster, certainly not slower. And, no more customer complaints that their foreign scripts show up as question marks in your application!

1 Comment

  1. This has been the perceived thinking but unfortunately I seem to recall Marco Cantu doing the benchmarking that you suggest and found that Delphi 2009 calls to the “native” UTF16 APIs were at best as good as, and at worse quite a bit slower than the Delphi 7 calls to the ANSI versions of the same APIs:


    But whilst performance through the Windows API is interesting, it is not typically where string performance is important.

    Where string performance becomes an issue – if at all – it is usually in the manipulations and processing of string data internal to your application code. I benchmarked a number of such operations in Delphi 2009 and compared them with Delphi 2007 and Delphi 7 and whilst in general Delphi 2009 achieves Delphi 7 levels of performance, overall it is somewhat less efficient than Delphi 2007, and in one or two specific cases dramatically so:


    Copy() and Pos() routines in particular suffer badly as a result of the transition to Unicode.

    Comment by Jolyon Smith — Friday, 19 September 2008 @ 8:58

Sorry, the comment form is closed at this time.