Skip to main content

Perl's text handling to the rescue....

There was a very interesting proble set forward at the Ingres forum, quoting :

"I have to use an Ingres Database to store my data which is in several languages (like french, german, and so on). In my web site, the user can use the function "search".
The problem are the special characters like éàèâ in french, or öäü in german. The user doesn't enter these characters, but it should be found anyway.

For example:
 
In the database is the word "château". The user types "chateau" (whithout â)
The program should find the "château", even if "chateau" was typed.
So all accents in the database should be replaced by something more
useful (like "_") 
Has someone an idea how to do that?
Ingres database version : 9.2.1 
 
Thanks a lot, Kakmael"

 and this is my attempt to tackle it,using Perl,of course


Ok you so you get the latin1 encoded string "Chateu" from your web form and you pass it to a CGI Perl script which does the following :

use charnames ':full';
use strict;
my $input_string="Chateau";
my @results;
my %mappings=( "\N{LATIN SMALL LETTER A}" => ["\N{LATIN SMALL LETTER A WITH GRAVE}","\N{LATIN SMALL LETTER A WITH DIAERESIS}"]);

@results=("'".$input_string."'");

foreach my $hash_key (keys %mappings) {
    foreach my $array_key ( @{$mappings{$hash_key}} ) {
        my $temp;
        ($temp=$input_string)=~ s/$hash_key/$array_key/;
        push @results, "'".$temp."'";
    }
}

my $search_string;
 {
local $"=",";
$search_string= 'SELECT * FROM test WHERE col2 in ' . '(' . "@results" . ')' ;
}

print $search_string;


basically you have a hash that maps the to be replaced characters to their counterparts by storing them into an anonymous array reference :

my %mappings=( "\N{LATIN SMALL LETTER A}" => ["\N{LATIN SMALL LETTER A WITH GRAVE}","\N{LATIN SMALL LETTER A WITH DIAERESIS}"]);

then you iterate through the nested data structure and you substitute the sought after character with its counterpart (the two foreach loops) and then you build the final string with a neat trick to get the right amount of commas correct
So the final $search_string will contain ('Chateu','Chàteu','Chäteu')

Of course this does not cover all possible cases (for example do you want all the 'a' replaced or just the first one??) since after all I don't know what the exact requirements are, and will need some tweaking, but you get the drift

Comments

Popular posts from this blog

Serverless JavaScript

We recently joined in an interesting two-hour long conversation about Serverless JavaScript led by Steve Faulkner of Bustle who answered questions on Bustle, the Shep framework, the mindset behind the AWS Lambda infrastructure, and related topics.

The discussion took place on the Sideway conversation-sharing platform on January 6th. Here we present the best takeaways from the session which really should be taken notice of by anyone working on AWS.

Steve Faulkner:
At Bustle we serve over 50 million unique readers per month through a "serverless" architecture based on AWS Lambda and Node.js.  Of course there are still servers but we don't manage them. This shift has allowed us to develop products faster and decreased the cost of our infrastructure. I'll answer any questions about how we made this transition and how it has worked out. I'll also discuss some of the tools and best practises including our open source framework shep

Eran Hammer:
When would you…

First Hybrid Open-Source RDBMS Powered By Hadoop and Spark

Splice Machine is a novel attempt to merge the best parts of the traditional relational database management systems and their NoSQL counterparts with distributed and in-memory computing based on Hadoop and Spark.

Traditional RDBMS find it tough when faced with massive amounts of data, which they typically handle by scaling up, albeit expensively. Another side effect of the sheer volume of data accumulating from the likes of social media and mobile devices, is that OLTP and OLAP queries carry high performance hits that subsequently have detrimental effects on real time analysis and instant decision making.

full article on i-programmer

Google's Cloud Spanner To Settle the Relational vs NoSQL Debate?

Cloud Spanner is a new proposition for database as a service that emphatically offers "Relational with NoSQL scaling". Will Google come to dominate yet another market?

Once upon a time there was only one kind of database management system, the RDBMS, "R" for relational. Despite its resilience and trustworthiness, it had its shortcomings; it did not scale well, and the relational model it served proved inadequate in the dawn of the Big Data era for handling massive amounts of schema-less, unstructured data.
For this and a few other reasons, a new breed of DBMS's emerged, one that could handle the avalanche of big data, based on the notion of the key-value pair, and doing so by scaling horizontally. But, in order to become versatile, this new breed of management systems had to forgo the safety of the ACID and the cosiness of SQL, both long term partners of the relational model. full article on i-programmer