CodeGear has invited a bunch of people on the Tiburón field test to participate in a BetaBlogging program. Essentially, we’ve been invited to share all the positive experiences we’re having with the upcoming Delphi and C++Builder 2009. The bad news is still covered by NDA. Therefore, I won’t talk about what’s good or what’s bad until the final Delphi 2009 is available for purchase.
Delphi 2009 will–finally–fully support Unicode. The VCL and RTL will be fully Unicodified. While AnsiString still exists for handling data using legacy encodings, Delphi 2009 applications are always Unicode applications. They will need Windows 2000 or later to run.
The most dramatic change in Delphi 2009 is that the “string” type is now an alias for UnicodeString instead of AnsiString. This is similar to “string” changing from ShortString to AnsiString in Delphi 2.
I indeed said UnicodeString, not WideString. WideString still exists, and is unchanged. WideString is allocated by the Windows memory manager, and should be used for interacting with COM objects. WideString maps directly to the BSTR type in COM.
UnicodeString is the brand-new reference-counted UTF-16 string type. If you don’t use COM, you can search-and-replace WideString with UnicodeString throughout your applications, and immediately get the performance benefits of reference counting and Delphi’s fast memory manager. WideString and UnicodeString are assignment compatible. Passing a UnicodeString to a COM function that takes a WideString is no problem. The Delphi compiler will inject some magic to allocate a temporary WideString with the same contents as the UnicodeString. This is in fact no different than passing an AnsiString where a WideString is expected in Delphi 2007.
One difference is that while Delphi 2007 silently converts between AnsiString and WideString, Delphi 2009 will by default issue a warning. This makes it much easier to spot places where you’ve declared a variable as “string”, while it really should be an AnsiString, or vice versa.
The only trouble spots with migrating to Unicode are places where you’re assuming that SizeOf(Char) = 1. Just like string is now UnicodeString, Char is now an alias to WideChar, and PChar an alias to PWideChar. AnsiChar and PAnsiChar still exist when you need them. If you’re doing Stream.Read(S[1], BytesToRead), better explicitly declare S as an AnsiString, even in Delphi 2007 or earlier. That will make sure the code won’t break in Delphi 2009.
If you’re calling Win32 API calls, continue to use PChar. Then everything will move to Unicode automatically. In Delphi 2009, Win32 API imports will translate to the W version instead of the A version. E.g. MessageBox() is the same as MessageBoxW(), which takes wide parameters. In Delphi 2007, it’s MessageBoxA(), which takes Ansi parameters. You can of course explicitly call MessageBoxW() or MessageBoxA() in either version of Delphi.
Conclusion: to get ready for the move to Unicode, understand the difference between “string” and “AnsiString”. If your code works wether a character is one byte or two bytes, use string. If your string must be 8-bit, use AnsiString. Then everything will migrate properly, and even work with both Delphi 2007 and 2009. If it matters whether strings are Unicode or not, you can use {$IFDEF UNICODE}. The UNICODE compiler directive is defined in Delphi 2009, but not in Delphi 2007.