简体   繁体   English

从python字符串中提取某些单词

[英]Extract certain words from a python string

I know there are a lot more related questions using regular expressions, but I would like to know what's the best way to extract certain words a from a string and add them to list. 我知道使用正则表达式还有很多相关问题,但是我想知道从字符串中提取某些单词并将它们添加到列表中的最佳方法是什么。

Suppose if my input is of form [A1A B2B, C3C, D4D, E5E], I would like to extract 3rd and 4th word from it. 假设我的输入格式为[A1A B2B,C3C,D4D,E5E],我想从中提取第3和第4个字。 My output should contain list with items ['C3C', 'D4D'] How do i achieve this using the findall? 我的输出应包含带有项['C3C','D4D']的列表。如何使用findall实现此目的?

Note: Every word above is not separated by a comma. 注意:上面的每个单词都不用逗号分隔。 There is no comma after A1A and B2B A1A和B2B之后没有逗号

Using re.findall 使用re.findall

s  = "[A1A B2B, C3C, D4D, E5E]"
print (re.findall("\w\d\w",s)[2:4])
'C3C', 'D4D']

Turn the string into a list, (and strip out the , , [ , and ] characters) then slice it: 转串入列表,(和剥离出,[]字符)然后切片它:

>>> s = "[A1A B2B, C3C, D4D, E5E]"
>>> l = [val.strip('[,]') for val in s.split()]
>>> l[2:4]
['C3C', 'D4D']

If your input is a list os strings, l = ["A1A B2B", "C3C", "D4D", "E5E"] , then split all the stings in the list into words, and create a new list `l_new' where each element will be one word: 如果您输入的是列表os字符串,则l = ["A1A B2B", "C3C", "D4D", "E5E"] ,然后将列表中的所有字符串拆分为单词,并创建一个新列表`l_new',其中每个元素将是一个词:

l = ["A1A B2B", "C3C", "D4D", "E5E"]
l_new = sum([x.split() for x in l],[])
l_new[2:4]

Or, if your actual input is a string l = '[A1A B2B, C3C, D4D, E5E]' , then use regular expressions. 或者,如果您的实际输入是字符串l = '[A1A B2B, C3C, D4D, E5E]' ,则使用正则表达式。 Remove square brackets and comas, and then split: 删除方括号和逗号,然后拆分:

import re
l_new = re.split(' ',re.sub('[\[\],]','',l))
l_new[2:4]

Remove the brackets on both sides, split, remove commas, take the slice you want. 拆下两侧的支架,拆开,拆下逗号,取出所需的切片。

mystr = "[A1A B2B, C3C, D4D, E5E]"
mystr = mystr[1:-1]

thelist = [x.replace(",","") for x in mystr.split()][2:4]

print thelist

Searching for words in your input doesn't sound like something that requires a regular expression (searching for values of a given structure does though- so you may want to clarify your input). 在输入中搜索单词听起来并不像需要一个正则表达式(尽管搜索给定结构的值确实需要-因此您可能需要澄清输入内容)。 You would want to use a regular expression though to help you out since you're processing a lot of possible delimiters rather than just a space or comma. 您可能希望使用正则表达式来帮助您,因为您正在处理很多可能的定界符,而不仅仅是空格或逗号。

>>> import re
>>> input = "A1A B2B, C3C, D4D, E5E"
>>> input_list = re.findall(r"[\w']+", input)
>>> input_list
['A1A', 'B2B', 'C3C', 'D4D', 'E5E']

Then, given a list of words you're searching for, you can use set intersections to quickly pull out what you need: 然后,给定要搜索的单词列表,可以使用集合交集快速提取所需内容:

>>> search_terms = ['C3C', 'D4D']
>>> list(set(input_list) and set(search_terms))
['C3C', 'D4D']

If you're looking only for words in specific place, use splices (I'm confused about which you need, though, from your question): 如果您只在特定位置查找单词,请使用拼接(但是,对于您的问题,我感到困惑):

>>> input_list[2:4]
['C3C', 'D4D']

If you're searching for specific patterns/values though that would fit a regex, then you need to give us your input and the patterns you want to find so that we can help out with that. 如果您正在搜索适合正则表达式的特定模式/值,那么您需要向我们提供您的输入和想要查找的模式,以便我们能够提供帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM