I want to find single and plural words between delimiters [[ ]] inside a text for example:
"I have a red [[pen]], two blue [[pen]]s, two black [[pencil]]s and a green [[pencil]]"
should be
['pen', 'pen', 'pencil', 'pencil']
and another list showing that which one is single for example zero for single and 1 for plurals:
[0, 1, 1, 0]
I know that using the following code I can extract the former list:
re.findall(r'\[\[(.*?)\]\]', str)
But I can't find a way to produce second array or any other approach to identify which one is single or plural. Any idea?
One option is to change your regex to include a second capturing group for s?
.
s = "I have a red [[pen]], two blue [[pen]]s, two black [[pencil]]s and a green [[pencil]]"
pat = r"\[\[(.*?)\]\](s?)"
matches = re.findall(pat, s)
print(matches)
#[('pen', ''), ('pen', 's'), ('pencil', 's'), ('pencil', '')]
As you can see, the elements in matches
are tuples. Now just use a list comprehension and check the second element of each tuple for 's'
.
myList = [1 if m[1] else 0 for m in matches]
print(myList)
#[0, 1, 1, 0]
Obviously this only works for plurals that end in 's'
.
You can check outside the brackets to find plural values:
import re
s= "I have a red [[pen]], two blue [[pen]]s, two black [[pencil]]s and a green [[pencil]]"
final_results = [[i[:-2], 0] if not i.endswith('s') else [i[:-3], 1] for i in re.findall('(?<=\[\[)[a-zA-Z]+\]\]s*', s)]
words = [a for a, _ in final_results]
indices = [b for _, b in final_results]
Output:
['pen', 'pen', 'pencil', 'pencil']
[0, 1, 1, 0]
One option:
word_string.split(']]')
gives a list of string portion going from each ]]
to the next.
i.startswith('s')
gives a boolean indicating whether the portion i
starts with s.
Casting this as int
gives 1 if it starts with 's' and 0 if it doesn't.
[int(i.startswith('s')) for i in word_string.split(']]') ]
gives a list of 0
s and 1
s indicating whether each portion after each ]]
starts with 's'. Since, given a particular word, you want to know whether the next portion starts with 's', you need to shift this over one. This can be done with [1:]
.
So, as a one-liner:
[int(i.startswith('s')) for i in word_string.split(']]') ][1:]
This assumes that words are plural if and only if they end with 's'.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.