簡體   English   中英

如何判斷一個字符串是不是英文單詞?

[英]How to determine if a string is an English word?

我有一個輸入字符串,其中一些不包含實際單詞(例如,它包含x^2 = y_2 + 4等數學公式)。 我想要一種方法來根據我們是否有 substring 個實際英語單詞來拆分我的輸入字符串。 例如:

如果我的字符串是:

"Taking the derivative of: f(x) = \int_{0}^{1} z^3, we can see that we always get x^2 = y_2 + 4 which is the same as taking the double integral of g(x)"

然后我想把它分成一個列表,比如:

["Taking the derivative of: ", "f(x) = \int_{0}^{1} z^3, ", "we can see that we always get ", "x^2 = y_2 + 4 ", "which is the same as taking the double integral of ", "g(x)"]

我怎樣才能做到這一點? 我不認為正則表達式適用於此,或者至少我不知道正則表達式中有任何方法可以檢測英文單詞的最長子串(包括逗號、句號、分號等)。

你可以簡單地使用這篇文章中提到的pyenchant庫:

import enchant
d = enchant.Dict("en_US")
print(d.check("Hello"))

Output:

True

你可以通過在你的命令行中輸入pip install pyenchant來安裝它。 在您的情況下,您必須遍歷字符串中的所有字符串並檢查當前字符串是否為英文單詞。 這是執行此操作的完整代碼:

import enchant
d = enchant.Dict("en_US")

string = "Taking the derivative of: f(x) = \int_{0}^{1} z^3, we can see that we always get x^2 = y_2 + 4 which is the same as taking the double integral of g(x)"

stringlst = string.split(' ')
wordlst = []

for string in stringlst:
    if d.check(string):
        wordlst.append(string)

print(wordlst)

Output:

['Taking', 'the', 'derivative', 'we', 'can', 'see', 'that', 'we', 'always', 'get', '4', 'which', 'is', 'the', 'same', 'as', 'taking', 'the', 'double', 'integral', 'of']

希望這會有所幫助!

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM