Skip to main content

Printing Unicode on the Windows Console and the importance of of i/o layers

I wanted to take a look on printing Unicode on the windows console by using the Win32 api and also check how is done in other languages, rather than directly from Perl which hides a lot of details

Problems when wanting to print to the console :

1.The windows console uses an internal buffer that can mangle output

2.Invoking the console using the Unicode switch (cmd.exe /u) does not have an effect

3.Windows supports UTF-16 inherently, not utf8

4.Documentation on Unicode and the console is hard to find.MSDN library, as usual, is a labyrinth with no beginning and end where you can loose track easily

The need arose when I needed to print an old style dos box using the cp437 box drawing characters on the console using their Unicode code points rather than their ASCII representation. The output was mangled/overlapped

Take a look at this pictorial output to get a clear view of the problem

The code that generated the incorrect result is :

#unicode_box_incorrect.pl
use Win32::API;

binmode(STDOUT,':utf8');

#Must set the console code page to UTF8
$SetConsoleOutputCP= new Win32::API( 'kernel32.dll', 'SetConsoleOutputCP', 'N','N' );
$SetConsoleOutputCP->Call(65001);

$line1="\x{2554}".("\x{2550}"x15)."\x{2557}\n";
$line2="\x{2551}".(" "x15)."\x{2551}\n";
$line3="\x{255A}".("\x{2550}"x15)."\x{255D}";
$unicode_string=$line1.$line2.$line3;

print "THIS IS THE INCORRECT EXAMPLE OUTPUT: \n";
print $unicode_string;

Since C++ has a better relationship with Windows than Perl, I did some research on how you can manipulate the console in C++ and used the underlying concepts in Perl.

Fortunately I bumped into illegalargumentexception
who has a fantastic tutorial on the subject using multi-language examples. Also the blog explains various issues on Unicode. great stuff, totally recommended

So the equivalent Perl code would be:

#unicode_box_correct.pl
use Win32::API;
use Encode qw(from_to encode);

#no need to use perlio (in this case) as we are bypassing it through the raw Win32API
#binmode(STDOUT,':utf8');

#Set the console code page to UTF8
$SetConsoleOutputCP= new Win32::API( 'kernel32.dll', 'SetConsoleOutputCP', 'N','N' );
$SetConsoleOutputCP->Call(65001);

#Get a reference to the console STDOUT
$GetStdHandle=new Win32::API( 'kernel32.dll', 'GetStdHandle', 'N', 'N' );
$handle=$GetStdHandle->Call(-11);

#Build dos window
$line1="\x{2554}".("\x{2550}"x15)."\x{2557}\n";
$line2="\x{2551}".(" "x15)."\x{2551}\n";
$line3="\x{255A}".("\x{2550}"x15)."\x{255D}";
$unicode_string=$line1.$line2.$line3;

print "THIS IS THE CORRECT EXAMPLE OUTPUT: \n";
#Force byte semantics because WriteFile API function needs length in bytes not characters
$lengthx=length(Encode::encode_utf8($unicode_string));

#use WriteFile API to treat the Console as a file.WriteConsole won't do it
$WriteFile=new Win32::API( 'kernel32.dll', 'WriteFile', 'NPNNN', 'N' );
$WriteFile->Call($handle,$unicode_string, $lengthx,0,0);


The trick is to use high-level console I/O (WriteFile) rather than low-level console I/O (WriteConsoleOutput) and there is no need to use the WideCharToMultiByte function since Perl uses UTF8 natively while C++ uses 16bit wide chars which need to be converted into multibytes. Note here that Windows treats the wchar as 'real' Unicode while it treats Utf8 as a multibyte encoding, the same as treating ASCII code pages.

Also note that for the example to work, the actual code page of the console does not play a role but the font must be set to Lucida console.
However the Lucida Console font does not support the whole Unicode range, so it does not include all Unicode glyphs.
There is only one issue, how to programmatically set the font on the users' console. This, unfortunately, can only be done on Windows vista and upwards with the SetCurrentConsoleFontEx api function

Ultimately, in pure Perl code without using any Win32 API's (although we still need it for the SetConsoleOutputCP), we turn perio buffering off by using the :unix layer, so it doesn't mess with the console buffer :

#unicode_box_correct_pure_perl.pl
use Win32::API;

binmode(STDOUT, ":unix:utf8");

#Must set the console code page to UTF8
$SetConsoleOutputCP= new Win32::API( 'kernel32.dll', 'SetConsoleOutputCP', 'N','N' );
$SetConsoleOutputCP->Call(65001);

$line1="\x{2554}".("\x{2550}"x15)."\x{2557}\n";
$line2="\x{2551}".(" "x15)."\x{2551}\n";
$line3="\x{255A}".("\x{2550}"x15)."\x{255D}";
$unicode_string=$line1.$line2.$line3;

print "THIS IS THE CORRECT EXAMPLE OUTPUT IN PURE PERL: \n";
print $unicode_string;


Compare this little Perl example with the complexity the other languages have to go through to get to the same result and appreciate Perl's power. magic.

Update:
Better yet,after a chat I had at the Perlmonks forum, the code could be improved by taking out the Win32::API completely and replacing it with Win32::Console :

#unicode_box_correct_pure_perl.pl
binmode(STDOUT, ":unix:utf8");

Win32::Console::OutputCP( 65001 );

$line1="\x{2554}".("\x{2550}"x15)."\x{2557}\n";
$line2="\x{2551}".(" "x15)."\x{2551}\n";
$line3="\x{255A}".("\x{2550}"x15)."\x{255D}";
$unicode_string=$line1.$line2.$line3;

print "THIS IS THE CORRECT EXAMPLE OUTPUT IN PURE PERL: \n";
print $unicode_string;

Comments

Popular posts from this blog

Serverless JavaScript

We recently joined in an interesting two-hour long conversation about Serverless JavaScript led by Steve Faulkner of Bustle who answered questions on Bustle, the Shep framework, the mindset behind the AWS Lambda infrastructure, and related topics.

The discussion took place on the Sideway conversation-sharing platform on January 6th. Here we present the best takeaways from the session which really should be taken notice of by anyone working on AWS.

Steve Faulkner:
At Bustle we serve over 50 million unique readers per month through a "serverless" architecture based on AWS Lambda and Node.js.  Of course there are still servers but we don't manage them. This shift has allowed us to develop products faster and decreased the cost of our infrastructure. I'll answer any questions about how we made this transition and how it has worked out. I'll also discuss some of the tools and best practises including our open source framework shep

Eran Hammer:
When would you…

Insider's Guide To Udacity Android Developer Nanodegree Part 3 - Making the Baking App

Continuing to chart my experience of Udacity's Android Developer Nanodegree we step up in level, embarking on the advanced part of the super-course.
Completing project "Popular Movies" (see Part 2 of this series) signaled the end of "Android Developer". Now we are ready to tackle the second element of the program "Advanced Android Developer", a new class with a new syllabus and project. Continuing to chart my experience of Udacity's Android Developer Nanodegree we step up in level, embarking on the advanced part of the super-course.

Completing project "Popular Movies" (see Part 2 of this series) signaled the end of "Android Developer". Now we are ready to tackle the second element of the program "Advanced Android Developer", a new class with a new syllabus and project.

"Advanced Android Developer" is a mixed bag of self contained material and of coding seven different sample apps to learn about the…

Export your Wunderlist tasks with XPath

As brought up in this ProductHunt thread, the news is that Wunderlist is going to be deprecated in favor of the new Microsoft To-Do note taking platform.

This is what Wunderlist support had to say in response to my inquiry on Wunderlist's future:

"Now that the next evolution of Wunderlist is here, in the form of Microsoft To-Do Preview (https://www.wunderlist.com/blog/...), Wunderlist will no longer receive any updates or bug fixes and will eventually be retired. It won’t happen in the next few months and we’ll be sure to give our users plenty of notice beforehand. In the meantime, you can continue to use Wunderlist normally. Of course, we’d also love for you to try To-Do and let us know how you like it – and how we can improve it. While Wunderlist will continue to exist alongside To-Do for the time being, support for Wunderlist will eventually be removed. Not to worry, though! We will inform all Wunderlist users prior to shutting down service. You'll have ample opport…