從文本文件或網頁中選擇Unicode字符

Question

我能夠對下頁所示的梵文單詞進行音節化。

https://gist.github.com/950405

但是我想做的是從以下網頁中找到以“ह”開頭的單詞。

http://www.sacred-texts.com/hin/mbs/mbs12030.htm

如何使用python完成？

Answer 1

如果您的單詞是Unicode字符串（收集在列表中的words ，則以下代碼段將顯示所有以"x"開頭的單詞

for word in words:
    if word.startswith(u"x"):
         print word

或者，如果您想獲取以u"x"開頭的所有單詞的列表，則可以使用列表推導：

selected_words = [ w for w in words if w.startswith(u"x") ]