Skip to main content

Opening files with unicode (japanese/chinese) characters in filename using Perl

(Read a complete article on the matter and much more
"Unicode issues regarding the Window OS file system and their handling from Perl)

Currently there is no way to manipulate a file named using Unicode characters,by using Perl's built in functions.
The perl 5.10 todo wish list states that functions like chdir, opendir, readdir, readlink, rename, rmdir e.g
"could potentially accept Unicode filenames either as input or output".
Windows default encoding is UTF-16LE,but the console 'dir' command will only return ANSI names.Thus unicode characters are replaced with "?"
,even if you invoke the console using the unicode switch (cmd.exe /u),change the codepage to 65001 which is utf8 on windows and use lucida console true type font which supports unicode.
A workaround is to use the COM facilities provided by windows (in this case Scripting.FileSystemObject) which provide a much higher level of abstraction or use the Win32 api calls.
I tried to read a file with japanese characters in the filename which resides in the current folder and then move the file to another folder.
The filename is "は群馬県高崎市を拠点に、様々なメディ.txt"
Since opendir ,readdir,rename etc do not support unicode you have to reside to the Scripting.FileSystemObject methods and properties which accept unicode.

This is the actual code which moves all files with .txt extension:

use Win32::OLE qw(in);
use Devel::Peek;

#CP_UTF8 is very important as it translates between Perl strings and U
+nicode strings used by the OLE interface

Win32::OLE->Option(CP => Win32::OLE::CP_UTF8);  

$obj = Win32::OLE->new('Scripting.FileSystemObject');

$folder = $obj->GetFolder(".");

$collection= $folder->{Files};

mkdir ("c:\\newfolder")||die;

foreach $value (in $collection) {
$filename= %$value->{Name};
next if ($filename !~ /.txt/);
Dump("$filename");  #check if the utf8 flag is on
$file=$obj->GetFile("$filename"); 
$file->Move("c:\\newfolder\\$filename");
print Win32::OLE->LastError() || "success\n\n";
}

If all goes well then a new folder -suprsingly- called 'newfolder' in C: drive would have been created and the japanese named filed should been have moved there.

This will only work if you have the asian languages (regional setings) support enabled and you should be able to see the japanase name in explorer as above

Comments

Anonymous said…
Thanks a lot,saved me loads of time.Exactly what I needed!
Daku said…
Spent ages trying to rename a file using Win32::Unicode / Win32API::File with no luck.
This worked without any hassle, thanks :)

Popular posts from this blog

Book Review : How To Create Pragmatic, Lightweight Languages

At last, a guide that makes creating a language with its associated baggage of lexers, parsers and compilers, accessible to mere mortals, rather to a group of a few hardcore eclectics as it stood until now.

The first thing that catches the eye, is the subtitle:

The unix philosophy applied to language design, for GPLs and DSLs"
What is meant by "unix philosophy" ?. It's taking simple, high quality components and combining them together in smart ways to obtain a complex result; the exact approach the book adopts.
I'm getting ahead here, but a first sample of this philosophy becomes apparent at the beginnings of Chapter 5 where the Parser treats and calls the Lexer like  unix's pipes as in lexer|parser. Until the end of the book, this pipeline is going to become larger, like a chain, due to the amount of components that end up interacting together.

The book opens by putting things into perspective in Chapter 1: Motivation: why do you want to build lan…

Deep Angel-The AI of Future Media Manipulation

Undeniably, we live in the era of media manipulation. Such powerful and accessible tools exist today that nearly everyone can do it. Now add to this collection Deep Angel, an artificial intelligence that can erase objects from photographs and videos.

I was notified of Deep Angel around the time I was watching Kill Switch, a futuristic and dystopic movie about our Earth getting cloned in order to suck the resources of the cloned planet, something that would sustain our world's energy needs for at least another millennium. To cut a long story short... full article on i-programmer.info

SAP's Creating Trustworthy and Ethical Artificial Intelligence

With the ink hardly dry on the pages of the EU Ethical AI Guidelines manifest, a free online course exploring the issues they raise is already in prospect on the openSAP platform. Run by members of the very same group, the European Union’s High-Level Expert Group on Artificial Intelligence, who wrote the guidelines and in cooperation with SAP's online education platform, a course with the titleCreating Trustworthy and Ethical Artificial Intelligence has been made accessible to anyone with an interest on AI or ML:
full article on i-programmer.info