Skip to main content

Printing Unicode on the Windows Console and the importance of of i/o layers

I wanted to take a look on printing Unicode on the windows console by using the Win32 api and also check how is done in other languages, rather than directly from Perl which hides a lot of details

Problems when wanting to print to the console :

1.The windows console uses an internal buffer that can mangle output

2.Invoking the console using the Unicode switch (cmd.exe /u) does not have an effect

3.Windows supports UTF-16 inherently, not utf8

4.Documentation on Unicode and the console is hard to find.MSDN library, as usual, is a labyrinth with no beginning and end where you can loose track easily

The need arose when I needed to print an old style dos box using the cp437 box drawing characters on the console using their Unicode code points rather than their ASCII representation. The output was mangled/overlapped

Take a look at this pictorial output to get a clear view of the problem

The code that generated the incorrect result is :

#unicode_box_incorrect.pl
use Win32::API;

binmode(STDOUT,':utf8');

#Must set the console code page to UTF8
$SetConsoleOutputCP= new Win32::API( 'kernel32.dll', 'SetConsoleOutputCP', 'N','N' );
$SetConsoleOutputCP->Call(65001);

$line1="\x{2554}".("\x{2550}"x15)."\x{2557}\n";
$line2="\x{2551}".(" "x15)."\x{2551}\n";
$line3="\x{255A}".("\x{2550}"x15)."\x{255D}";
$unicode_string=$line1.$line2.$line3;

print "THIS IS THE INCORRECT EXAMPLE OUTPUT: \n";
print $unicode_string;

Since C++ has a better relationship with Windows than Perl, I did some research on how you can manipulate the console in C++ and used the underlying concepts in Perl.

Fortunately I bumped into illegalargumentexception
who has a fantastic tutorial on the subject using multi-language examples. Also the blog explains various issues on Unicode. great stuff, totally recommended

So the equivalent Perl code would be:

#unicode_box_correct.pl
use Win32::API;
use Encode qw(from_to encode);

#no need to use perlio (in this case) as we are bypassing it through the raw Win32API
#binmode(STDOUT,':utf8');

#Set the console code page to UTF8
$SetConsoleOutputCP= new Win32::API( 'kernel32.dll', 'SetConsoleOutputCP', 'N','N' );
$SetConsoleOutputCP->Call(65001);

#Get a reference to the console STDOUT
$GetStdHandle=new Win32::API( 'kernel32.dll', 'GetStdHandle', 'N', 'N' );
$handle=$GetStdHandle->Call(-11);

#Build dos window
$line1="\x{2554}".("\x{2550}"x15)."\x{2557}\n";
$line2="\x{2551}".(" "x15)."\x{2551}\n";
$line3="\x{255A}".("\x{2550}"x15)."\x{255D}";
$unicode_string=$line1.$line2.$line3;

print "THIS IS THE CORRECT EXAMPLE OUTPUT: \n";
#Force byte semantics because WriteFile API function needs length in bytes not characters
$lengthx=length(Encode::encode_utf8($unicode_string));

#use WriteFile API to treat the Console as a file.WriteConsole won't do it
$WriteFile=new Win32::API( 'kernel32.dll', 'WriteFile', 'NPNNN', 'N' );
$WriteFile->Call($handle,$unicode_string, $lengthx,0,0);


The trick is to use high-level console I/O (WriteFile) rather than low-level console I/O (WriteConsoleOutput) and there is no need to use the WideCharToMultiByte function since Perl uses UTF8 natively while C++ uses 16bit wide chars which need to be converted into multibytes. Note here that Windows treats the wchar as 'real' Unicode while it treats Utf8 as a multibyte encoding, the same as treating ASCII code pages.

Also note that for the example to work, the actual code page of the console does not play a role but the font must be set to Lucida console.
However the Lucida Console font does not support the whole Unicode range, so it does not include all Unicode glyphs.
There is only one issue, how to programmatically set the font on the users' console. This, unfortunately, can only be done on Windows vista and upwards with the SetCurrentConsoleFontEx api function

Ultimately, in pure Perl code without using any Win32 API's (although we still need it for the SetConsoleOutputCP), we turn perio buffering off by using the :unix layer, so it doesn't mess with the console buffer :

#unicode_box_correct_pure_perl.pl
use Win32::API;

binmode(STDOUT, ":unix:utf8");

#Must set the console code page to UTF8
$SetConsoleOutputCP= new Win32::API( 'kernel32.dll', 'SetConsoleOutputCP', 'N','N' );
$SetConsoleOutputCP->Call(65001);

$line1="\x{2554}".("\x{2550}"x15)."\x{2557}\n";
$line2="\x{2551}".(" "x15)."\x{2551}\n";
$line3="\x{255A}".("\x{2550}"x15)."\x{255D}";
$unicode_string=$line1.$line2.$line3;

print "THIS IS THE CORRECT EXAMPLE OUTPUT IN PURE PERL: \n";
print $unicode_string;


Compare this little Perl example with the complexity the other languages have to go through to get to the same result and appreciate Perl's power. magic.

Update:
Better yet,after a chat I had at the Perlmonks forum, the code could be improved by taking out the Win32::API completely and replacing it with Win32::Console :

#unicode_box_correct_pure_perl.pl
binmode(STDOUT, ":unix:utf8");

Win32::Console::OutputCP( 65001 );

$line1="\x{2554}".("\x{2550}"x15)."\x{2557}\n";
$line2="\x{2551}".(" "x15)."\x{2551}\n";
$line3="\x{255A}".("\x{2550}"x15)."\x{255D}";
$unicode_string=$line1.$line2.$line3;

print "THIS IS THE CORRECT EXAMPLE OUTPUT IN PURE PERL: \n";
print $unicode_string;

Comments

Popular posts from this blog

Book Review : How To Create Pragmatic, Lightweight Languages

At last, a guide that makes creating a language with its associated baggage of lexers, parsers and compilers, accessible to mere mortals, rather to a group of a few hardcore eclectics as it stood until now.

The first thing that catches the eye, is the subtitle:

The unix philosophy applied to language design, for GPLs and DSLs"
What is meant by "unix philosophy" ?. It's taking simple, high quality components and combining them together in smart ways to obtain a complex result; the exact approach the book adopts.
I'm getting ahead here, but a first sample of this philosophy becomes apparent at the beginnings of Chapter 5 where the Parser treats and calls the Lexer like  unix's pipes as in lexer|parser. Until the end of the book, this pipeline is going to become larger, like a chain, due to the amount of components that end up interacting together.

The book opens by putting things into perspective in Chapter 1: Motivation: why do you want to build lan…

How Much Gameplay Can You Pack In Just 13K?

Given our expectations of Xbox games, you might consider writing a game within a 13K limit, which is the challenge for the annual js13K competition far too restrictive. Its results are now out and prove that it is possible to produce a game that is fun to play. 

Back in the tape loading days and on platforms the likes of Commodore64 games came in sizes of 4K or less. As proof of concept, here's a list of a few such 4K titles, copied over from Lemon64 's archive:
Alien SidestepBug CrusherDot GobblerClose EncountersDot Gobbler v2GridrunnerLaser CyclesMarios BrewerySpace ActionSpace RicoshayTank WarsHesmon64Retro Ball  Fast forward to now, at a time when Javascript's eating the world by making all sorts of applications or  games available to everyone through the medium of the browser, rendering the need of dedicated platforms and Operating systems obsolete, 13K is sufficient enough to pack both gameplay AND cool graphics due to the advanced browser engines and HTML5.

Hour of Code 2017 Introduces App Lab

t's the time of year when the world-class Hour of Code once more commences; just an hour for introducing coding to the uninitiated, having them complete self guided tutorials. But is a hour sufficient? What can a beginner actually code within this limit? The answer is a bit more complicated than that, so let's find out all about it! Integrated into the larger, worldwide, annual Computer Science Education week, this year taking place December 4-10, Hour of Code's novel mission has always been to get everybody coding, aged from 4 to 104, by providing: "a one-hour introduction to computer science, designed to demystify code, showing that anybody can learn the basics, and broadening participation in the field of computer science". But first of all, why this obsession with Computer Science, in particular in getting  kids as young as 4 to learn to code? The answer is simple. Nowadays code is everywhere around us, from desktop computers to mobile phones and, thanks to w…