[英]How to put all words and phrases in list into a search expression (Python)
I have this list of lists:我有这个列表列表:
groups = [['|FOOD|','shrimps','chicken wok','bowl of rice'],['|DRINK|','water','cranberry juice','tea']]
I'm trying to get the output to be:我试图让 output 成为:
[['|FOOD|',
'[lemma="shrimps"]',
'[lemma="chicken"][lemma="wok"]',
'[lemma="bowl"][lemma="of"][lemma="rice"]'],
['|DRINK|',
'[lemma="water"]',
'[lemma="cranberry"][lemma="juice"]',
'[lemma="tea"]']]
So, basically I need every word lemmatized for a corpus search.所以,基本上我需要为语料库搜索对每个单词进行词形还原。 Some words though, are not words but phrases .但是,有些单词不是单词而是短语。 I've only yet figured out the code for single words , here it is:我还没有弄清楚单个单词的代码,这里是:
import re
groups = [[f'[lemma="{word}"]' if not ' ' in word and not re.search(r'\|.*\|', word) else word for word in group] for group in groups]
This returns groups as:这会将组返回为:
[['|FOOD|',
'[lemma="shrimps"]',
'chicken wok',
'bowl of rice'],
['|DRINK|',
'[lemma="water"]',
'cranberry juice',
'[lemma="tea"]']]
So I made it not include that words containing a whitespace ( phrases ), plus the topic words.所以我让它不包括包含空格(短语)的单词,以及主题词。 What then is the code to deal with these phrases and have them look like like I typed above?那么处理这些短语并让它们看起来像我上面输入的代码是什么?
I'm a beginner, so if you know a better way to organise all this data, let me know.我是初学者,所以如果你知道组织所有这些数据的更好方法,请告诉我。
You do not really need a regex here, you may use if not word.startswith("|") and not word.endswith("|")
to check if the entry has no pipes on both ends:您在这里并不需要正则表达式,您可以使用if not word.startswith("|") and not word.endswith("|")
来检查条目两端是否没有管道:
groups = [[''.join([r"""[lemma="{}"]""".format(w) for w in word.split()]) if not word.startswith("|") and not word.endswith("|") else word for word in group] for group in groups]
See the Python demo online .在线查看 Python 演示。 Output: Output:
[['|FOOD|',
'[lemma="shrimps"]',
'[lemma="chicken"][lemma="wok"]',
'[lemma="bowl"][lemma="of"][lemma="rice"]'],
['|DRINK|',
'[lemma="water"]',
'[lemma="cranberry"][lemma="juice"]',
'[lemma="tea"]']
]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.