简体   繁体   English

检查字符串值的子字符串包括python中的任何字典单词

[英]Check substrings of a string value includes any dictionary words in python

I am wondering that there is any function in Python to check substrings of a string value including any dictionary words.我想知道 Python 中有任何函数可以检查字符串值的子字符串,包括任何字典单词。

ex.前任。 check_str = "!$#apple!ed"

check_str includes "apple" which is a dictionary word, and I want to check if there is any way we can check that. check_str包括"apple" ,这是一个字典词,我想检查是否有任何方法可以检查它。

This is borderline natural language processing, but a naive solution is to load up a list of valid words from a text file, split the string into chunks of alphabetical characters and look each one up in the dictionary.这是边界自然语言处理,但一个简单的解决方案是从文本文件加载有效单词列表,将字符串拆分为字母字符块,然后在字典中查找每个字符。

>>> import re
>>> import requests
>>> s = "!$#apple!ed"
>>> url = "https://raw.githubusercontent.com/dwyl/english-words/master/words_dictionary.json"
>>> words = requests.get(url).json()
>>> alpha_chunks = re.findall(r"[a-z]+", s, re.I)
>>> [x for x in alpha_chunks if x.lower() in words]
['apple', 'ed']

It turns out "ed" is a perfectly valid dictionary word in addition to "apple" , so if you intend to reject that, use a word list that suits your needs.事实证明,除了"apple"之外, "ed"是一个完全有效的字典单词,因此如果您打算拒绝它,请使用适合您需要的单词列表。 Also, the dictionary requested above is 6 mb, so you'd likely want to cache it depending on what your use case is.此外,上面请求的字典是 6 mb,因此您可能希望根据您的用例来缓存它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM