[Date Prev][Date Next][Thread Prev][Thread Next][Minivend by date
][Minivend by thread
]
Re: [mv] Search filter for iso chars
This is borrowed from the Fluid Dynamics Search Engine (http://www.xav.com):
# This function translates high-bit Latin characters, and their HTML
# expansions, into their English approximations. For example:
# Â => A
# Ã => A
# Ã => A
# This forms the basis for support of Latin languages in this search
# engine. Because many end users do not have keyboards allowing them
# to type in "Ã" or whatever, they will type in "A" instead. By
# translating both the raw text and the user search terms with this
# function, I will be able to match their needs.
# Drawback: words like "für" in German will get false matches to words
# like "fur" in English. This is fairly rare though.
# Translation table based on Ian Graham's list at:
# http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html
sub RawTranslate {
local $_ = shift;
# Strip tabs and newline characters; replace with whitespace:
tr!\n\r\t! !;
s'\cM' 'og;
s!&(.)(acute|grave|circ|uml|tilde);!$1!og;
s'(÷|&(nbsp|divide);)' 'og;
s'(&#(192|193|194|195|196|197|224|225|226|227|228|229|230);|À|Á|Â|Ã
s'(&#(192|193|194|195|196|197|224|225|226|227|228|229|230);|À|Á|Â|Ã|Ä|Å|à|á|
â|ã|ä|æ|å|&(.ring|aelig);)'a'og;
s'(ß|ß|ß)'b'og;
s'(&#(199|231);|Ç|ç|&.cedil;)'c'og;
s'(&#(198|200|201|202|203|232|233|234|235);|Æ|È|É|Ê|Ë|è|é|ê|ë|&AEli
s'(&#(198|200|201|202|203|232|233|234|235);|Æ|È|É|Ê|Ë|è|é|ê|ë|Æ)'e'og;
s'(&#(204|205|206|207|236|238|239);|Ì|Í|Î|Ï|ì|í|î|ï)'i'og;
s'(&#(209|241);|ñ|Ñ)'n'og;
s'(&#(216|210|211|212|213|214|240|242|243|244|245|246|248);|Ø|Ò|Ó|Ô
s'(&#(216|210|211|212|213|214|240|242|243|244|245|246|248);|Ø|Ò|Ó|Ô|Õ|Ö|ð|ò|
ó|ô|õ|ö|ø|&(.slash|eth);)'o'og;
s'(&#(217|218|219|220|249|250|251|252);|Ù|Ú|Û|Ü|ù|ú|û|ü)'u'og;
s'(&#(222|254);|Þ|þ|þ)'p'og;
s'(×|×|×)'x'og;
s'(&#(221|253);|Ý|ý)'y'og;
return $_;
}
At 06:13 AM 12/15/1999 , you wrote:
>****** message to minivend-users from Adriano Nagelschmidt Rodrigues
><anr@ime.usp.br> ******
>
>Hi,
>
>I would like my text db searches to match, irrespective of special iso
>characters, eg
>
>jose matches josé
>josé matches jose
>
>Setting the locale didn't help. Is there a way I can specify a
>
>tr/Éé.../Ee.../
>
>filter for both strings being compared?
>
>Thanks,
>
>--
>Adriano
>-
>To unsubscribe from the list, DO NOT REPLY to this message. Instead, send
>email with 'UNSUBSCRIBE minivend-users' in the body to Majordomo@minivend.com.
>Archive of past messages: http://www.minivend.com/minivend/minivend-list
Ryan Hertz tel 800-645-BAIT
Webmaster fax 520-645-2588
Advertising Director http://www.insideline.net
Gary Yamamoto Custom Baits, Inc. http://www.yamamoto.baits.com