MiniVend Akopia Services

[Date Prev][Date Next][Thread Prev][Thread Next][Minivend by date ][Minivend by thread ]

Re: [mv] Search filter for iso chars



This is borrowed from the Fluid Dynamics Search Engine (http://www.xav.com):

# This function translates high-bit Latin characters, and their HTML
# expansions, into their English approximations.  For example:
#       Â => A
#       Ã => A
#       Ã  => A
# This forms the basis for support of Latin languages in this search
# engine.  Because many end users do not have keyboards allowing them
# to type in "Ã" or whatever, they will type in "A" instead.  By
# translating both the raw text and the user search terms with this
# function, I will be able to match their needs.
# Drawback: words like "für" in German will get false matches to words
# like "fur" in English.  This is fairly rare though.

# Translation table based on Ian Graham's list at:
# http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html

sub RawTranslate {
         local $_ = shift;

         # Strip tabs and newline characters; replace with whitespace:
         tr!\n\r\t!   !;
         s'\cM' 'og;

         s!&(.)(acute|grave|circ|uml|tilde);!$1!og;

         s'(&#247|&(nbsp|divide);)' 'og;
         s'(&#(192|193|194|195|196|197|224|225|226|227|228|229|230);|À|Á|Â|Ã 
s'(&#(192|193|194|195|196|197|224|225|226|227|228|229|230);|À|Á|Â|Ã|Ä|Å|à|á| 
â|ã|ä|æ|å|&(.ring|aelig);)'a'og;
         s'(ß|ß|ß)'b'og;
         s'(&#(199|231);|Ç|ç|&.cedil;)'c'og;
         s'(&#(198|200|201|202|203|232|233|234|235);|Æ|È|É|Ê|Ë|è|é|ê|ë|&AEli 
s'(&#(198|200|201|202|203|232|233|234|235);|Æ|È|É|Ê|Ë|è|é|ê|ë|Æ)'e'og;
         s'(&#(204|205|206|207|236|238|239);|Ì|Í|Î|Ï|ì|í|î|ï)'i'og;
         s'(&#(209|241);|ñ|Ñ)'n'og;
         s'(&#(216|210|211|212|213|214|240|242|243|244|245|246|248);|Ø|Ò|Ó|Ô 
s'(&#(216|210|211|212|213|214|240|242|243|244|245|246|248);|Ø|Ò|Ó|Ô|Õ|Ö|ð|ò| 
ó|ô|õ|ö|ø|&(.slash|eth);)'o'og;
         s'(&#(217|218|219|220|249|250|251|252);|Ù|Ú|Û|Ü|ù|ú|û|ü)'u'og;
         s'(&#(222|254);|Þ|þ|þ)'p'og;
         s'(×|×|×)'x'og;
         s'(&#(221|253);|Ý|ý)'y'og;
         return $_;
         }


At 06:13 AM 12/15/1999 , you wrote:
>******    message to minivend-users from Adriano Nagelschmidt Rodrigues 
><anr@ime.usp.br>     ******
>
>Hi,
>
>I would like my text db searches to match, irrespective of special iso
>characters, eg
>
>jose matches josé
>josé matches jose
>
>Setting the locale didn't help. Is there a way I can specify a
>
>tr/Éé.../Ee.../
>
>filter for both strings being compared?
>
>Thanks,
>
>--
>Adriano
>-
>To unsubscribe from the list, DO NOT REPLY to this message.  Instead, send
>email with 'UNSUBSCRIBE minivend-users' in the body to Majordomo@minivend.com.
>Archive of past messages: http://www.minivend.com/minivend/minivend-list


Ryan Hertz                                              tel  800-645-BAIT
Webmaster                                               fax  520-645-2588
Advertising Director                            http://www.insideline.net
Gary Yamamoto Custom Baits, Inc.            http://www.yamamoto.baits.com



Search for: Match: Format: Sort by: