比较用户输入的Unicode字符与文件中的Unicode字符

Question

So i have this code so that i can input the unicode string from the user 所以我有此代码，以便我可以从用户输入unicode字符串

print "Enter a nepali string" 
split_string=raw_input().decode(sys.stdin.encoding or locale.getpreferredencoding(True))

And i have in file some unicode string and if that unicode string matches as substring in the user input string then i have to split that string . 而且我在文件中有一些unicode字符串，如果该unicode字符串与用户输入字符串中的子字符串匹配，那么我就必须拆分该字符串。 suppose i have "सुर" in file and if that matches "सुरक्षा" which is input by user then i want only "क्षा" in output 假设我在文件中有“सुर”，并且如果它与用户输入的“सुरक्षा”匹配，那么我只想在输出中输入“क्का”

with codecs.open("prefixnepali.txt","rw","utf-8") as prefix:
    for line in prefix:
          line=ud.normalize('NFC',line)
          if line in split_string:
             prefixy=split_string[len(line):len(split_string)]
             print prefixy
          else:
            print line

But when i run the program i get 但是当我运行程序时，我得到了

दि दि

सुर सुर

रु रु

Which are the unicode string in files when i input "सुरक्षा" in the terminal. 当我在终端中输入“सुरक्षा”时，文件中的unicode字符串是哪些。 Can i know what is wrong here?? 我能知道这是怎么回事吗？

Answer 1

The problem might be simple: a line read from file has newline character at its end. 问题可能很简单：从文件读取的行的末尾有换行符。 Use splitlines as advised in Reading a file without newlines and Getting rid of \\n when using .readlines() 使用splitlines在劝读取文件而无需换行，并使用.readlines当除暴安良的\\ n（）

with codecs.open("prefixnepali.txt","rw","utf-8") as prefix:
    for line in prefix.read().splitlines():
          line=ud.normalize('NFC',line)
          if line in split_string:
             prefixy=split_string[len(line):len(split_string)]
             print prefixy
          else:
             print line

And btw, line in split_string will look for occurrence of line anywhere within split_string . 而且顺便说一句， line in split_string会寻找发生line内的任何地方split_string 。 If you're looking for exactly the prefix match, you should use split_string.find(line) == 0 or split_string[0:len(line)] == line . 如果您要查找完全匹配的前缀，则应该使用split_string.find(line) == 0或split_string[0:len(line)] == line 。

比较用户输入的Unicode字符与文件中的Unicode字符

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-08-19 09:03:37

比较用户输入的Unicode字符与文件中的Unicode字符

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-08-19 09:03:37

解决方案1
0 已采纳 2015-08-19 09:03:37