如何使用python查找文本中的字符偏移

Question

我的目标是在两个对齐的文本文档中识别匹配的字符串，然后在每个文档中找到匹配字符串的起始字符的位置。

doc1=['the boy is sleeping', 'in the class', 'not at home']
doc2=['the girl is reading', 'in the class', 'a serious student']

我的尝试：

# find matching string(s) that exist in both document list:
matchstring=[x for x in doc1 if x in doc2]
Output=matchstring='in the class'

“

现在的问题是在doc1和doc2中找到匹配字符串的字符偏移量（不包括标点符号，包括空格）。

理想结果：

Position of starting character for matching string in doc1=20
Position of starting character for matching string in doc2=20

关于文字对齐有什么想法吗？ 谢谢。

Answer 1

嘿，尝试这个：

doc1=['the boy is sleeping', 'in the class', 'not at home']
doc2=['the girl is reading', 'in the class', 'a serious student']

temp=''.join(list(set(doc1) & set(doc2)))
resultDoc1 = ''.join(doc1).find(temp)
resultDoc2 = ''.join(doc2).find(temp)

print "Position of starting character for matching string in doc1=%d" % (resultDoc1 + 1)
print "Position of starting character for matching string in doc2=%d" % (resultDoc2 + 1)

它完全符合您的期望！

如何使用python查找文本中的字符偏移

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-03-02 19:40:05

如何使用python查找文本中的字符偏移

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-03-02 19:40:05

解决方案1
1 已采纳 2014-03-02 19:40:05