[英]How to check if a string contains accented Latin characters like é in Ruby?
Given:鉴于:
str1 = "é" # Latin accent
str2 = "囧" # Chinese character
str3 = "ジ" # Japanese character
str4 = "e" # English character
How to differentiate str1
(Latin accent characters) from rest of the strings?如何区分str1
(拉丁重音字符)与其他字符串?
Update:更新:
Given给定的
str1 = "\xE9" # Latin accent é actually stored as \xE9 reading from a file
How would the answer be different?答案会有什么不同?
I would first strip out all plain ASCII characters with gsub
, and then check with a regex to see if any Latin characters remain.我会先用gsub
所有纯 ASCII 字符,然后用正则表达式检查是否还有拉丁字符。 This should detect the accented latin characters.这应该检测带重音的拉丁字符。
def latin_accented?(str)
str.gsub(/\p{Ascii}/, "") =~ /\p{Latin}/
end
latin_accented?("é") #=> 0 (truthy)
latin_accented?("囧") #=> nil (falsy)
latin_accented?("ジ") #=> nil (falsy)
latin_accented?("e") #=> nil (falsy)
Try to use /\\p{Latin}/.match(strX)
or /\\p{Latin}&&[^a-zA-Z]/
(if you want to detect only special Latin characters).尝试使用/\\p{Latin}/.match(strX)
或/\\p{Latin}&&[^a-zA-Z]/
(如果您只想检测特殊的拉丁字符)。
By the way, "e" (str4) is also a Latin character.顺便说一下,“e”(str4)也是一个拉丁字符。
Hope it helps.希望能帮助到你。
I'd use a two-stage approach:我会使用两阶段的方法:
Example:例子:
def is_accented_latin?(test_string)
test_string.encode("ISO-8859-1") # just to see if it raises an exception
test_string.match(/[ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöùúûüýþÿ]/)
rescue Encoding::UndefinedConversionError
false
end
I strongly suggest you select for yourself the accented characters you're attempting to screen for, rather than just copying what I've written;我强烈建议您自己选择要筛选的重音字符,而不是仅仅复制我写的内容; I certainly may have missed some.我当然可能错过了一些。 Also note that this will always return false
for strings containing non-Latin characters, even if the string also contains a Latin character with an accent.另请注意,对于包含非拉丁字符的字符串,这将始终返回false
,即使该字符串还包含带重音的拉丁字符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.