简体   繁体   中英

extract words with specific character sequence

I have a list of strings. I only want to extract the words within each string that have a specific character sequence.

For example

l1=["grad madd have", "ddim middle left"]

I want all the words that have sequence "dd"

so I would like to get

[["madd"], ["ddim", "middle"]]

I've been trying patterns of the form

[re.findall(r'(\b.*?dd.*\s+)',word) for word in l1] 

but have had little success

You can just use list comprehension for this. You don't need regex to accomplish what you're trying to do.

See code in use here

l1=["grad madd have", "ddim middle left"]
print([s for a in l1 for s in a.split() if 'dd' in s])

This loops over l1 and splits each value by the space character. It then tests that substring to see if it contains dd and returns it if it does.

您接近了,您想要使用\\w*将单词字符0匹配很多次:

[re.findall(r'\w*dd\w*', word) for word in l1]

You can try with this Regex : \\b\\w*dd\\w*\\b

Regex101 Demo.

Try this in one line:

l1=["grad madd have", "ddim middle left"]

print(list(map(lambda x:list(filter(lambda y:'dd' in y,x.split())),l1)))

output:

[['madd'], ['ddim', 'middle']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM