Skip to main content

Opening files with unicode (japanese/chinese) characters in filename using Perl

(Read a complete article on the matter and much more
"Unicode issues regarding the Window OS file system and their handling from Perl)

Currently there is no way to manipulate a file named using Unicode characters,by using Perl's built in functions.
The perl 5.10 todo wish list states that functions like chdir, opendir, readdir, readlink, rename, rmdir e.g
"could potentially accept Unicode filenames either as input or output".
Windows default encoding is UTF-16LE,but the console 'dir' command will only return ANSI names.Thus unicode characters are replaced with "?"
,even if you invoke the console using the unicode switch (cmd.exe /u),change the codepage to 65001 which is utf8 on windows and use lucida console true type font which supports unicode.
A workaround is to use the COM facilities provided by windows (in this case Scripting.FileSystemObject) which provide a much higher level of abstraction or use the Win32 api calls.
I tried to read a file with japanese characters in the filename which resides in the current folder and then move the file to another folder.
The filename is "は群馬県高崎市を拠点に、様々なメディ.txt"
Since opendir ,readdir,rename etc do not support unicode you have to reside to the Scripting.FileSystemObject methods and properties which accept unicode.

This is the actual code which moves all files with .txt extension:

use Win32::OLE qw(in);
use Devel::Peek;

#CP_UTF8 is very important as it translates between Perl strings and U
+nicode strings used by the OLE interface

Win32::OLE->Option(CP => Win32::OLE::CP_UTF8);  

$obj = Win32::OLE->new('Scripting.FileSystemObject');

$folder = $obj->GetFolder(".");

$collection= $folder->{Files};

mkdir ("c:\\newfolder")||die;

foreach $value (in $collection) {
$filename= %$value->{Name};
next if ($filename !~ /.txt/);
Dump("$filename");  #check if the utf8 flag is on
$file=$obj->GetFile("$filename"); 
$file->Move("c:\\newfolder\\$filename");
print Win32::OLE->LastError() || "success\n\n";
}

If all goes well then a new folder -suprsingly- called 'newfolder' in C: drive would have been created and the japanese named filed should been have moved there.

This will only work if you have the asian languages (regional setings) support enabled and you should be able to see the japanase name in explorer as above

Comments

Anonymous said…
Thanks a lot,saved me loads of time.Exactly what I needed!
Daku said…
Spent ages trying to rename a file using Win32::Unicode / Win32API::File with no luck.
This worked without any hassle, thanks :)

Popular posts from this blog

Serverless JavaScript

We recently joined in an interesting two-hour long conversation about Serverless JavaScript led by Steve Faulkner of Bustle who answered questions on Bustle, the Shep framework, the mindset behind the AWS Lambda infrastructure, and related topics.

The discussion took place on the Sideway conversation-sharing platform on January 6th. Here we present the best takeaways from the session which really should be taken notice of by anyone working on AWS.

Steve Faulkner:
At Bustle we serve over 50 million unique readers per month through a "serverless" architecture based on AWS Lambda and Node.js.  Of course there are still servers but we don't manage them. This shift has allowed us to develop products faster and decreased the cost of our infrastructure. I'll answer any questions about how we made this transition and how it has worked out. I'll also discuss some of the tools and best practises including our open source framework shep

Eran Hammer:
When would you…

Insider's Guide To Udacity Android Developer Nanodegree Part 3 - Making the Baking App

Continuing to chart my experience of Udacity's Android Developer Nanodegree we step up in level, embarking on the advanced part of the super-course.
Completing project "Popular Movies" (see Part 2 of this series) signaled the end of "Android Developer". Now we are ready to tackle the second element of the program "Advanced Android Developer", a new class with a new syllabus and project. Continuing to chart my experience of Udacity's Android Developer Nanodegree we step up in level, embarking on the advanced part of the super-course.

Completing project "Popular Movies" (see Part 2 of this series) signaled the end of "Android Developer". Now we are ready to tackle the second element of the program "Advanced Android Developer", a new class with a new syllabus and project.

"Advanced Android Developer" is a mixed bag of self contained material and of coding seven different sample apps to learn about the…

AWS and Ionic Team Up In Starter Project

Amazon is quick in recognizing that just offering support for a number of popular programing languages is not enough to lure hoards of developers to the platform. That's why we are seeing a move towards wrapping its AWS services with greater user-friendliness.

The start was made with the introduction of CodeStar, which aimed to simplify the setting up of a project's AWS infrastructure, especially  with regard to policy and authorization, as we examined in CodeStar to Simplify Development On AWS. 

It continues this trend with the release of the open source Ionic AWS starter project Mobile Web and Hybrid Application which aims to act as a skeleton, or boilerplate, Ionic application tweaked in such a way to give developers a headstart in configuring their mobile Ionic front-end applications in relation to an AWS backend.

full article on i-programmer