Micro-ISV.asia

Thursday, 7 August 2008

Get Ready for Delphi 2009 and Unicode

Filed under: Programming — Jan Goyvaerts @ 18:52

CodeGear has invited a bunch of people on the Tiburón field test to participate in a BetaBlogging program. Essentially, we’ve been invited to share all the positive experiences we’re having with the upcoming Delphi and C++Builder 2009. The bad news is still covered by NDA. Therefore, I won’t talk about what’s good or what’s bad until the final Delphi 2009 is available for purchase.

Delphi 2009 will–finally–fully support Unicode. The VCL and RTL will be fully Unicodified. While AnsiString still exists for handling data using legacy encodings, Delphi 2009 applications are always Unicode applications. They will need Windows 2000 or later to run.

The most dramatic change in Delphi 2009 is that the “string” type is now an alias for UnicodeString instead of AnsiString. This is similar to “string” changing from ShortString to AnsiString in Delphi 2.

I indeed said UnicodeString, not WideString. WideString still exists, and is unchanged. WideString is allocated by the Windows memory manager, and should be used for interacting with COM objects. WideString maps directly to the BSTR type in COM.

UnicodeString is the brand-new reference-counted UTF-16 string type. If you don’t use COM, you can search-and-replace WideString with UnicodeString throughout your applications, and immediately get the performance benefits of reference counting and Delphi’s fast memory manager. WideString and UnicodeString are assignment compatible. Passing a UnicodeString to a COM function that takes a WideString is no problem. The Delphi compiler will inject some magic to allocate a temporary WideString with the same contents as the UnicodeString. This is in fact no different than passing an AnsiString where a WideString is expected in Delphi 2007.

One difference is that while Delphi 2007 silently converts between AnsiString and WideString, Delphi 2009 will by default issue a warning. This makes it much easier to spot places where you’ve declared a variable as “string”, while it really should be an AnsiString, or vice versa.

The only trouble spots with migrating to Unicode are places where you’re assuming that SizeOf(Char) = 1. Just like string is now UnicodeString, Char is now an alias to WideChar, and PChar an alias to PWideChar. AnsiChar and PAnsiChar still exist when you need them. If you’re doing Stream.Read(S[1], BytesToRead), better explicitly declare S as an AnsiString, even in Delphi 2007 or earlier. That will make sure the code won’t break in Delphi 2009.

If you’re calling Win32 API calls, continue to use PChar. Then everything will move to Unicode automatically. In Delphi 2009, Win32 API imports will translate to the W version instead of the A version. E.g. MessageBox() is the same as MessageBoxW(), which takes wide parameters. In Delphi 2007, it’s MessageBoxA(), which takes Ansi parameters. You can of course explicitly call MessageBoxW() or MessageBoxA() in either version of Delphi.

Conclusion: to get ready for the move to Unicode, understand the difference between “string” and “AnsiString”. If your code works wether a character is one byte or two bytes, use string. If your string must be 8-bit, use AnsiString. Then everything will migrate properly, and even work with both Delphi 2007 and 2009. If it matters whether strings are Unicode or not, you can use {$IFDEF UNICODE}. The UNICODE compiler directive is defined in Delphi 2009, but not in Delphi 2007.

9 Comments

  1. Argh! In some applications this can be tragic…

    There is some way to set strings as ansistring by default?

    Comment by P.O.W. — Thursday, 23 October 2008 @ 14:40

  2. In Delphi 2009, “string” is always UnicodeString. There’s no compiler option to change that. If you want to use AnsiString, you’ll have to declare your strings as that. AnsiString overloads of common functions from the SysUtils unit can be found in the new AnsiStrings unit.

    Comment by Jan Goyvaerts — Thursday, 23 October 2008 @ 20:55

  3. Aaaarrrrggg!. Now I have tons of code that does not even compile!. I think the very least we can expect from a new version specially from a compiler, is to be backward compatible. I think that a compiler directive, say USE_OLD_STRING should map:
    {$IfDef USE_OLD_STRING }
    Type
    String = AnsiString;
    {$Else }
    Type
    String = UnicodeString;
    {$EndIf }

    Then we could us existing components, and start migrating to the new string type. What can we expect in Delphi 2010?, another dramatic change that will lead us to rewrite already written code?. I´m pretty disappointed about Delphi and Embarcadero. I´ve using Pascal since Turbo Pascal 1.0 and have never been in such a mess of existing and running code.

    Regards,
    Alvaro Castiello

    Comment by Alvaro Castiello — Monday, 15 December 2008 @ 21:00

  4. After having ported quite a bit of code to Delphi 2009 already, I’m actually amazed at how smooth the transition is. Well-written code that doesn’t use pointer hacks and doesn’t assume SizeOf(Char) = 1 compiles without changes, giving your application instant Unicode support.

    There’s no compiler directive, as I mentioned previously, but you can search-and-replace string with AnsiString in your own code.

    I expect a 64-bit compiler in Delphi 2010. That shouldn’t be much of an issue. Integer will remain 32-bit. NativeInt (new since Delphi 2007) will be 32-bit or 64-bit depending on the compiler.

    Comment by Jan Goyvaerts — Tuesday, 16 December 2008 @ 10:28

  5. For us folk that write applications that need to work outside their own little world, eg to serial/USB ports, to network comms (tcp/ip) etc – … any interaction to other! software (unfortunately most developers work only in their own little pool) – this is a complete disaster. I am now expected to go through EVERY component in our suite, including all the Async-professional stuff, all the components that may deal with comms. This is just ASKING for broken code. In fact just on this our company has denied us upgrading – we are stuck with D2006 forever now. The microsoft supporters here are laughing at us. Luckily Indy still supports Delphi, that would have been a disaster to pick through.

    And please don’t come back with “that code should have been written correctly using dynamic byte arrays …”, because that is just pious nonsense. I am done with Delphi, and I started with Delphi 1 and loved it ever since (except the help in 2006).
    I am very pleased to see help is restored to some glory – D2006 help was useless.
    VERY Disappointed.

    Comment by Steve — Sunday, 1 February 2009 @ 5:35

  6. I guess the Microsoft supporters have forgotten what happened to VB6. Technology changes rapidly. Sometimes, bridges have to be burned in order to move on. CodeGear can’t keep their product stuck in the stone age because there are people using component sets that have been discontinued. Frankly, I don’t think CodeGear could have made the migration to Delphi 2009 any easier than it is. Those who don’t want Unicode, can continue to use Delphi 2007, which works perfectly with Windows Vista and Windows 7. This is no different from Delphi 2 being 32-bit only. Those who weren’t ready for 32-bit at that time could stay with Delphi 1.

    If you like to store binary data in strings, check out the all new RawByteString type in Delphi 2009. Good old AnsiString is still available too.

    Comment by Jan Goyvaerts — Monday, 2 February 2009 @ 8:46

  7. True – I have been working quite a bit in Microsoft VS2008 lately and it feels like the dark ages compared to Delphi. MSVS(C++) is a terrible product, C# is ok though. But you misunderstand me. I did not say I do not want unicode, and to say that you just “stick with D2007″ is stupid. In 10 years you are still “stuck”. Rather I wish to point out that unicode cannot be used by a vast amount of now legacy code (eg comms) and in my field everything is comms so this hits us hard. The solution I am using is to (in units where it matters) search and replace ‘string’ with ‘ansistring’. The compiler performs implicit string conversions as required, e.g. if calling or called by VCL code. Obviously strings passed by reference will not compile and require a code change – but this is a much smaller set of changes to make. I think when I finally get through this ‘upgrade’ my annoyance will subside, but at the moment it is a big and risky job.
    What would have been convenient is a compiler directive that could be put at the top of a unit to set string = ansistring automatically. But its absence is not the end of the world. Perhaps there are good reasons for it, but they are not yet clear to me.
    As an example (because I do prefer Delphi to all other languages [that I use]), On my own home machine I am attempting to get our largest project to compile. Even with selected “string->ansistring” replacement I have spent a fair few evenings just trying to get good old RxLibrary to compile ({$IFDEF UNICODE}…) – and some of the changes are sheer guesswork at what the code was trying to do. I have no idea how other component suites will be – I am dreading DelphiFreeStuff and ultimately AsyncPro. But my main concern is that when I do finally get it to compile, I have no real assurance it will run correctly. A lot of testing…sigh.

    You suggest RawByteString. I would avoid that little fella, even the help suggests avoiding it. I have always enjoyed using TByteDynArray (and all the other T..DynArrays) which luckily is what I have typically used for my dynamic arrays rather than string. However the advantage of using string to maintain a dynamic array of bytes is the string functions, eg ‘+’, insert() and delete(). You cannot add two dynamic arrays!! but you can add two strings – great for comms buffers etc. Hence most comms code uses string, and hence the problem we now have.
    I am rambling now. I guess I concede, and will just have to wade through the change and cross fingers. Long live Delphi – it really should be given far more credit than it gets.

    Comment by Steve — Wednesday, 4 February 2009 @ 19:23

  8. Actually, I just read the blog on RawByteString, and hadn’t appreciated that aspect of this string type. For my purpose (comms) I will still use TByteDynArray and AnsiString, but I can see the use of RawByteString. Very cool. Thanks.

    Comment by Steve — Wednesday, 4 February 2009 @ 19:29

  9. Steve,

    You are my soul in pain. Not all char types hold a simple char. I have now the same Turbo Async problems, and in fact, I also suggested such a directive in:
    https://forums.codegear.com/message.jspa?messageID=86179

    I’m trying to imagine a modem that instead of AT understands “Atención Marcar” in Unicode or a byte CPU port that suddenly understands Unicode also :D. A suggestion: don’t replace String with AnsiString, instead create:
    Type
    _Char = AnsiChar;
    _String = AnsiSAtring;
    and then do your stuff then we will be ready for other nonpleasant surprise in D2010

    Regards,
    Alvaro Castiello

    Comment by Alvaro Castiello — Thursday, 16 April 2009 @ 11:41

Sorry, the comment form is closed at this time.