Micro-ISV.asia

Friday, 8 August 2008

Will The Real UTF8String Stand Up?

Filed under: Programming — Jan Goyvaerts @ 18:52

For some time, Delphi has had a little-know type called UTF8String. It was little-know, because it didn’t really work as advertised. Try this in Delphi 2007:

var S: UTF8String;
S := "Tiburón";
WriteLn(Length(S))

Though S is declared as UTF8String, it stores the string using the default Windows code page, instead of UTF-8, with a length of 7 bytes. That’s because in Delphi 2007, you’ll find this declaration in System.pas:

type UTF8String = type string;

This means that in Delphi 2007, there’s really no difference between UTF8String and AnsiString. In Delphi 2009, however, you’ll find this declaration:

type UTF8String = type AnsiString(65001);

65001 is the code page number for UTF-8 on the Windows platform. You can declare your own string types this way using any code page understood by the WideCharToMultiByte() and MultiByteToWideChar() API calls. E.g. if you assign a UnicodeString to a UTF8String, WideCharToMultiByte(65001) is called to convert the string from UTF-16 to UTF-8. This is no different than Delphi 2007 (or 2009) calling WideCharToMultiByte(0) when you assign a WideString to an AnsiString.

In Delphi 2009, the code snippet at the top of this post will convert “Tiburón” to UTF-8 at compile time. At runtime, 8 bytes are loaded directly into S. There will be no call to WideCharToMultiByte() at runtime for this literal assignment. The accented ó takes up two bytes when encoded as UTF-8. Length(S) will return 8.

You can easily declare your own typed AnsiStrings in Delphi 2009. If UTF8String is too modern for you, try this:

type EBCDICString = type AnsiString(37);

Thursday, 7 August 2008

Get Ready for Delphi 2009 and Unicode

Filed under: Programming — Jan Goyvaerts @ 18:52

CodeGear has invited a bunch of people on the Tiburón field test to participate in a BetaBlogging program. Essentially, we’ve been invited to share all the positive experiences we’re having with the upcoming Delphi and C++Builder 2009. The bad news is still covered by NDA. Therefore, I won’t talk about what’s good or what’s bad until the final Delphi 2009 is available for purchase.

Delphi 2009 will–finally–fully support Unicode. The VCL and RTL will be fully Unicodified. While AnsiString still exists for handling data using legacy encodings, Delphi 2009 applications are always Unicode applications. They will need Windows 2000 or later to run.

The most dramatic change in Delphi 2009 is that the “string” type is now an alias for UnicodeString instead of AnsiString. This is similar to “string” changing from ShortString to AnsiString in Delphi 2.

I indeed said UnicodeString, not WideString. WideString still exists, and is unchanged. WideString is allocated by the Windows memory manager, and should be used for interacting with COM objects. WideString maps directly to the BSTR type in COM.

UnicodeString is the brand-new reference-counted UTF-16 string type. If you don’t use COM, you can search-and-replace WideString with UnicodeString throughout your applications, and immediately get the performance benefits of reference counting and Delphi’s fast memory manager. WideString and UnicodeString are assignment compatible. Passing a UnicodeString to a COM function that takes a WideString is no problem. The Delphi compiler will inject some magic to allocate a temporary WideString with the same contents as the UnicodeString. This is in fact no different than passing an AnsiString where a WideString is expected in Delphi 2007.

One difference is that while Delphi 2007 silently converts between AnsiString and WideString, Delphi 2009 will by default issue a warning. This makes it much easier to spot places where you’ve declared a variable as “string”, while it really should be an AnsiString, or vice versa.

The only trouble spots with migrating to Unicode are places where you’re assuming that SizeOf(Char) = 1. Just like string is now UnicodeString, Char is now an alias to WideChar, and PChar an alias to PWideChar. AnsiChar and PAnsiChar still exist when you need them. If you’re doing Stream.Read(S[1], BytesToRead), better explicitly declare S as an AnsiString, even in Delphi 2007 or earlier. That will make sure the code won’t break in Delphi 2009.

If you’re calling Win32 API calls, continue to use PChar. Then everything will move to Unicode automatically. In Delphi 2009, Win32 API imports will translate to the W version instead of the A version. E.g. MessageBox() is the same as MessageBoxW(), which takes wide parameters. In Delphi 2007, it’s MessageBoxA(), which takes Ansi parameters. You can of course explicitly call MessageBoxW() or MessageBoxA() in either version of Delphi.

Conclusion: to get ready for the move to Unicode, understand the difference between “string” and “AnsiString”. If your code works wether a character is one byte or two bytes, use string. If your string must be 8-bit, use AnsiString. Then everything will migrate properly, and even work with both Delphi 2007 and 2009. If it matters whether strings are Unicode or not, you can use {$IFDEF UNICODE}. The UNICODE compiler directive is defined in Delphi 2009, but not in Delphi 2007.

Wednesday, 28 May 2008

Great Programming Font Updated

Filed under: Programming — Jan Goyvaerts @ 11:31

I just received an email from Damien Guard to let me know he has updated his Envy Code R font. That’s one of my three favorite programming fonts.

He reduced the font’s line height. It will now fit about the same number of lines on the screen at a given font size as the other two fonts. The repertoire of accented characters has been expanded. The font is still restricted to the Latin alphabet.

Thursday, 24 April 2008

.NET Reflector Bares It All

Filed under: Programming — Jan Goyvaerts @ 10:19

Last week I received a support inquiry from an EditPad Pro user asking how to make EditPad Pro run an XML Validation tool. This particular tool was a very simple .NET application that was designed to be invoked as a tool by another text editor. There was no such example tool on EditPad Pro’s web site.

Of course, I could have just created a tool configuration for the existing XML validator applet and linked to it. But I didn’t want to subject my customers to the integration stuff the other validator provided for the competing editor. It would flash an error message as it failed to create .ini files or registry entries.

Besides, the application was only a few kilobytes. How hard could it be to write something similar? It turned out to be all too easy.

What follows is certainly old news to experienced .NET developers. But it was new to me, and quite an eye-opener. My .NET experience was limited to creating test applications to test the System.Text.RegularExpressions package for RegexBuddy, and a sample application for using RegexBuddy’s COM integration.

I had previously heard about something called .NET Reflector. But I had never tried it. It’s a free download. It asks for an email, but you can enter a bogus address as you don’t need to receive any emails to get it the .NET Reflector to run. Just unzip and run.

I pointed the reflector to the other XML validation utility that I had downloaded. I was presented with a class browser that not only showed the application’s complete class structure, but also the complete source code. Variable names etc. were still there. It was trivial to figure out what the application was doing, and write my own application to duplicate it. Though I didn’t copy/paste anything, I easily could have.

Obviously, digging into a real-world application is going to be far more involved than dissecting a little utility that, in my version, is only about 100 lines of code. But it’s not going to be any more complicated than reading the actual source code. Obfuscating the application will make the code harder to interpret. But it won’t really stop anyone from seeing your code. This effectively makes all .NET code open source, pretty much like all JavaScript code that runs in your browser is open source. It can be obfuscated, but not hidden.

Compare the screen shot below with the actual source code.

Examining my own XMLValidatorTool.exe with the .NET Reflector

« Previous Page