[英]Python not identifying a white space character
I am near my wit's end with this problem: Basically, I need to remove a double space gap between words.我对这个问题已经接近我的智慧了:基本上,我需要删除单词之间的双空格。 My program happens to be in Hebrew, but this is the basic idea:
我的程序恰好是希伯来语,但这是基本思想:
TITLE: הלכות השכמת הבוקר
Notice there is an extra space between the first two words (Herbew reads right to left).请注意,前两个单词之间有一个额外的空格(Herbew 从右到左阅读)。
I tried many, many different methods, here are a few:我尝试了很多很多不同的方法,这里有一些:
# tried all these with and without unicode
title = re.sub(u'\s+',u' ',title.decode('utf-8'))
title = title.replace(" "," ")
title = title.replace(u" הלכות",u" הלכות")
Until finally I resorted to making a very unnecessary method (some of the formatting got messed up when pasting):直到最后我采取了一种非常不必要的方法(粘贴时一些格式被弄乱了):
def remove_blanks(s):
word_list = s.split(" ")
final_word_list = []
for word in word_list:
print "word: " +word
#tried every qualifier I could think of...
if not_blank(word) and word!=" " and True != re.match("s*",word):
print "^NOT BLANK^"
final_word_list.append(word)
return ' '.join(final_word_list)
def not_blank(s):
while " " in s:
s = s.replace(" ","")
return (len(s.replace("\n","").replace("\r","").replace("\t",""))!=0);
And, to my utter amazement, this is what I got back:而且,令我大吃一惊的是,这就是我得到的:
word: הלכות
^NOT BLANK^
word: #this should be tagged as Blank!!
^NOT BLANK^
word: השכמת
^NOT BLANK^
word: הבוקר
^NOT BLANK^
So apparently my qualifier didn't work.所以显然我的预选赛没有用。 What is going on here?
这里发生了什么?
There was a hiding \\xe2\\x80\\x8e, LEFT-TO-RIGHT MARK.有一个隐藏的\\xe2\\x80\\x8e,从左到右的标记。 Found it using repr(word).
使用 repr(word) 找到它。 Thanks @mgilson!
谢谢@mgilson!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.