Thursday, 23 October 2008

Delphi 2009 String Performance in a Nutshell

Filed under: Programming — Jan Goyvaerts @ 21:43

By now, enough has been written by myself and other bloggers about string performance in Delphi 2009 to make your head spin. This article is primarily intended for developers who never worried much about raw string performance in Delphi 2007 or earlier versions. If your application performs well in Delphi 2007, it will preform just as well in Delphi 2009. All you need to do is follow a few simple rules. As a bonus, your application will fully support Unicode, an important advantage in today’s globalized world.

1. Declare all strings as string. In Delphi 2009, that’s an alias for the new UnicodeString type, which is the native string type of the Delphi 2009 RTL and VCL. UnicodeString maps straight to the native string type of the Win32 API via a simple PChar typecast. PChar is an alias for PWideChar in Delphi 2009.

2. Eliminate all string conversion warnings. Delphi 2009 has new compiler warnings for implicit and explicit string casts. By default, implicit cast warnings are on, and explicit cast warnings are off. If you get any, go back to step 1. Trust me. String conversions are the real performance killer. People whine that UnicodeString needs twice the memory to store English text. Unless you’re downloading the whole Internet, a modern PC has plenty of CPU ticks and memory bandwidth to handle UnicodeString just as quickly as AnsiString. But a needless conversion step, which means an extra memory allocation for the string and a table lookup for each and every character, that’s a lot of extra work. Eliminate those warnings, using explicit string casts only when you know you want to pay the performance penalty.

3. Convert Ansi->Unicode as soon as possible, and Unicode->Ansi as late as possible. Given the vast amount of Ansi text out there, your application will still have to deal with that. When reading Ansi text, e.g. when loading files created by a previous version of your software, store the text into a UnicodeString immediately. When writing Ansi text, cast your UnicodeString to an AnsiString as late as possible, while still making sure you’re doing the cast only once when writing the same string more than once.

4. Don’t be tempted by UTF8String or any other typed AnsiString. In Delphi 2009, UTF8String really stores text in UTF-8. In previous versions of Delphi, UTF8String was a mere alias to AnsiString. You had to call UTF8Encode() and UTF8Decode explicitly. Delphi 2009 does the conversion automatically on assignment, which comes with the same compiler warning and speed implications I mentioned in step 2. UTF8String is great for loading and saving text in UTF-8, if you follow the principles of step 3. Don’t use UTF8String throughout your application.

5. Turn off the string format checking compiler option. For Delphi 2009 developers, this new compiler option is a pure speed penalty with no benefit. C++Builder 2009 developers should leave it on when compiling Delphi code that will be called from C++.

Remember that these are hard-and-fast generic rules for GUI applications that don’t have string-related performance issues in Delphi 2007. Follow these rules if you just want to migrate your applications to Delphi 2009 and reap the benefits of Unicode, without having to worry about performance. Don’t follow these rules if your strings don’t fit into 32-bit address space.

1 Comment

  1. Very nice and adequate compilation.

    I would however like to add a few notes regarding C++Builder 2009:
    – In RTM, UnicodeString::c_str() returns char* and thereby converts the payload of the underlying Unicode string to ANSI. In a few corner cases, this can corrupt data. Moreover, it is inconsistent with std::basic_string::c_str() which returns const CharT* (IOW, std::wstring::c_str() returns const wchar_t*). And finally, if you decide to use the Unicode variants of the Win32 APIs and therefore have UNICODE (and _UNICODE) defined, any code like
    MessageBox (Handle, Edit1->Text.c_str(), Application->Title.c_str(), MB_OK);
    will break. Therefore, one should define the preprocessor constant USTRING_AS_WCHART which forces UnicodeString::c_str() to return wchar_t* and to not corrupt any data.
    – If this precaution is taken and, additionally, all DFM files are checked for event handler type mismatches (event handlers that take an AnsiString might falsely be assigned an event that passes an UnicodeStirng now), it is perfectly safe to disable string checks in C++Builder.
    – For C++ code, this can be done by defining _STRINGCHECKS_OFF. (For all string checks that happen in dstring.cpp/ustring.cpp, you would have to recompile those units.)

    Comment by Moritz Beutel — Saturday, 25 October 2008 @ 21:28

Sorry, the comment form is closed at this time.