简体   繁体   English

如何将列表中的所有单词和短语放入搜索表达式(Python)

[英]How to put all words and phrases in list into a search expression (Python)

I have this list of lists:我有这个列表列表:

groups = [['|FOOD|','shrimps','chicken wok','bowl of rice'],['|DRINK|','water','cranberry juice','tea']]

I'm trying to get the output to be:我试图让 output 成为:

[['|FOOD|',
  '[lemma="shrimps"]',
  '[lemma="chicken"][lemma="wok"]',
  '[lemma="bowl"][lemma="of"][lemma="rice"]'],
 ['|DRINK|',
  '[lemma="water"]',
  '[lemma="cranberry"][lemma="juice"]',
  '[lemma="tea"]']]

So, basically I need every word lemmatized for a corpus search.所以,基本上我需要为语料库搜索对每个单词进行词形还原。 Some words though, are not words but phrases .但是,有些单词不是单词而是短语 I've only yet figured out the code for single words , here it is:我还没有弄清楚单个单词的代码,这里是:

import re
groups = [[f'[lemma="{word}"]' if not ' ' in word and not re.search(r'\|.*\|', word) else word for word in group] for group in groups]

This returns groups as:这会将返回为:

[['|FOOD|', 
  '[lemma="shrimps"]', 
  'chicken wok', 
  'bowl of rice'],
 ['|DRINK|', 
  '[lemma="water"]', 
  'cranberry juice', 
  '[lemma="tea"]']]

So I made it not include that words containing a whitespace ( phrases ), plus the topic words.所以我让它不包括包含空格(短语)的单词,以及主题词。 What then is the code to deal with these phrases and have them look like like I typed above?那么处理这些短语并让它们看起来像我上面输入的代码是什么?

I'm a beginner, so if you know a better way to organise all this data, let me know.我是初学者,所以如果你知道组织所有这些数据的更好方法,请告诉我。

You do not really need a regex here, you may use if not word.startswith("|") and not word.endswith("|") to check if the entry has no pipes on both ends:您在这里并不需要正则表达式,您可以使用if not word.startswith("|") and not word.endswith("|")来检查条目两端是否没有管道:

groups = [[''.join([r"""[lemma="{}"]""".format(w) for w in word.split()]) if not word.startswith("|") and not word.endswith("|") else word for word in group] for group in groups]

See the Python demo online .在线查看 Python 演示 Output: Output:

[['|FOOD|', 
 '[lemma="shrimps"]', 
 '[lemma="chicken"][lemma="wok"]', 
 '[lemma="bowl"][lemma="of"][lemma="rice"]'], 
['|DRINK|', 
 '[lemma="water"]', 
 '[lemma="cranberry"][lemma="juice"]', 
 '[lemma="tea"]']
 ]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在Python 2中搜索列表内的多个短语 - How to search for multiple phrases inside a list in Python 2 如何从 python 中的数字和单词的原始列表中创建仅包含数字和单词/短语的新列表? - How to create a new list with just numbers and words/phrases from a original list with both numbers and words in python? 如何将短语列表转换为单词列表? - How to convert a list of phrases into list of words? 使用 Python 在一个 .txt 文件中搜索单词或短语列表(并显示上下文) - Use Python to search one .txt file for a list of words or phrases (and show the context) 如何在python中搜索字符串中的短语 - How to search phrases in a string in python Python - 文本分析 - 搜索短语而不是简单的单词(标记) - Python - textual analysis - search for phrases instead of simple words (tokens) 根据列表中的多个单词从 pandas dataframe 中提取所有短语 - Extract all phrases from a pandas dataframe based on multiple words in list Python Regexp:在包含OR和AND的查询字符串中查找所有单词/词组 - Python Regexp: find all words/phrases within a querystring containing OR and AND 如何让python在列表中搜索一个单词而不是列表中所有单词的文本? - How do I get python to search text for one word in a list rather than all the words in a list? 如何在我的词典中搜索关键词和短语以执行功能 - How to search for key words AND phrases in my dictionary to execute a function
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM