简体   繁体   中英

Distinguish words between delimiters [[ ]] and [[ ]]s in python

I want to find single and plural words between delimiters [[ ]] inside a text for example:

"I have a red [[pen]], two blue [[pen]]s, two black [[pencil]]s and a green [[pencil]]"

should be

['pen', 'pen', 'pencil', 'pencil']

and another list showing that which one is single for example zero for single and 1 for plurals:

[0, 1, 1, 0]

I know that using the following code I can extract the former list:

re.findall(r'\[\[(.*?)\]\]', str)

But I can't find a way to produce second array or any other approach to identify which one is single or plural. Any idea?

One option is to change your regex to include a second capturing group for s? .

s = "I have a red [[pen]], two blue [[pen]]s, two black [[pencil]]s and a green [[pencil]]"
pat = r"\[\[(.*?)\]\](s?)"
matches = re.findall(pat, s)
print(matches)
#[('pen', ''), ('pen', 's'), ('pencil', 's'), ('pencil', '')]

As you can see, the elements in matches are tuples. Now just use a list comprehension and check the second element of each tuple for 's' .

myList = [1 if m[1] else 0 for m in matches]
print(myList)
#[0, 1, 1, 0]

Obviously this only works for plurals that end in 's' .

You can check outside the brackets to find plural values:

import re
s= "I have a red [[pen]], two blue [[pen]]s, two black [[pencil]]s and a green [[pencil]]"
final_results = [[i[:-2], 0] if not i.endswith('s') else [i[:-3], 1] for i in re.findall('(?<=\[\[)[a-zA-Z]+\]\]s*', s)]
words = [a for a, _ in final_results]
indices = [b for _, b in final_results]

Output:

['pen', 'pen', 'pencil', 'pencil']
[0, 1, 1, 0]

One option:

word_string.split(']]') gives a list of string portion going from each ]] to the next.

i.startswith('s') gives a boolean indicating whether the portion i starts with s.

Casting this as int gives 1 if it starts with 's' and 0 if it doesn't.

[int(i.startswith('s')) for i in word_string.split(']]') ] gives a list of 0 s and 1 s indicating whether each portion after each ]] starts with 's'. Since, given a particular word, you want to know whether the next portion starts with 's', you need to shift this over one. This can be done with [1:] .

So, as a one-liner:

[int(i.startswith('s')) for i in word_string.split(']]') ][1:]

This assumes that words are plural if and only if they end with 's'.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM