从文本文件或网页中选择Unicode字符

Question

I am able to syllabalise the devnagari words as shown on the following page. 我能够对下页所示的梵文单词进行音节化。

https://gist.github.com/950405 https://gist.github.com/950405

But what I want to do is to find the words those start with "ह" from the following webpage. 但是我想做的是从以下网页中找到以“ह”开头的单词。

http://www.sacred-texts.com/hin/mbs/mbs12030.htm http://www.sacred-texts.com/hin/mbs/mbs12030.htm

How it can done using python? 如何使用python完成？

Answer 1

If your words are unicode strings, collected in a list words , then the following snippet shows you all words beginning with "x" 如果您的单词是Unicode字符串（收集在列表中的words ，则以下代码段将显示所有以"x"开头的单词

for word in words:
    if word.startswith(u"x"):
         print word

Or if you want to get a list of all words starting with u"x" , you can use list comprehension: 或者，如果您想获取以u"x"开头的所有单词的列表，则可以使用列表推导：

selected_words = [ w for w in words if w.startswith(u"x") ]

从文本文件或网页中选择Unicode字符

问题描述

1 个解决方案

解决方案1
0 已采纳 2011-10-05 10:04:16

从文本文件或网页中选择Unicode字符

问题描述

1 个解决方案

解决方案1 0 已采纳 2011-10-05 10:04:16

解决方案1
0 已采纳 2011-10-05 10:04:16