I have the following list of phrases:
[
'This is erleada comp. recub. con película 60 mg.',
'This is auxina e-200 uicaps. blanda 200 mg.',
'This is ephynalsol. iny. 100 mg.',
'This is paracethamol 100 mg.'
]
I need to get the following result:
[
'This is erleada.',
'This is auxina.',
'This is ephynalsol.',
'This is paracethamol.'
]
I wrote the following function to clean phrases:
def clean(string):
sub_strings = [".","iny","comp","uicaps]
try:
string = [string[:string.index(sub_str)].rstrip() for sub_str in sub_strings]
return string
except:
return string
and use it as follows:
for phrase in phrases:
drug = clean(phrase)
This should do it:
import re
phrases = [
'This is erleada comp. recub. con película 60 mg.',
'This is auxina e-200 uicaps. blanda 200 mg.',
'This is ephynalsol. iny. 100 mg.',
'This is paracethamol 100 mg.'
]
pattern = re.compile("^This is \w*")
for phrase in phrases:
match = pattern.search(phrase)
print(match.group(0) + ".")
Outputs:
This is erleada.
This is auxina.
This is ephynalsol.
This is paracethamol.
Explanation: You see we have used a regex pattern ^This is \w*
. Here is how it works.
^
means the start of the line. So ^This is
means your line must start with This is
.\w
matches the following single character range az
, AZ
, 0-9
, and _
\w*
in the previous point, I said \w
matches a single character within a-zA-Z0-9_
range. Notice that there is a *
after \w
. *
stands for zero or more. If you use *
after \w
, it will match all the characters that are satisfied by \w
and placed one after another.^This is
means start with This is
and \w*
means match all characters that are within the range of \w
. Since space, comma, full stops are not satisfied by \w
, it will stop matching at that point and return something like This is something.
You could obtain the same results with slicing:
phrases=[
'This is erleada comp. recub. con película 60 mg.',
'This is auxina e-200 uicaps. blanda 200 mg.',
'This is ephynalsol. iny. 100 mg.',
'This is paracethamol 100 mg.'
]
drug =[sentence if sentence[-1]=="." else sentence+"." for sentence in [" ".join(phrase) for phrase in [x.split()[0:3] for x in phrases]]]
The code takes the first three words from your sentences and puts them in a list, and adds a period after the third word. But of course, the previous provided regex solution is much nicer.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.