Lets say I have a Text file with the below content:(Contents added post original answer)
Quetiapine fumarate Drug substance This document
Povidone Binder USP
This line doesn't contain any medicine name.
This line contains Quetiapine fumarate which shouldn't be extracted as it not present at the
beginning of the line.
Dibasic calcium phosphate dihydrate Diluent USP is not present in the csv
Lactose monohydrate Diluent USNF
Magnesium stearate Lubricant USNF
Lactose monohydrate, CI 77491
0.6
Colourant
E 172
Some lines to break the group.
Silicon dioxide colloidal anhydrous
(0.004
Gliding agent
Ph Eur
Adding some random lines.
Povidone
(0.2
Lubricant
Ph Eur
I have a csv containing a list of medicine name which I want to match inside the .txt file and extract all the data that is present between 2 unique medicines(when the medicine name is at the beginning of the line).(Example of medicines from the csv file are 'Quetiapine fumarate', 'Povidone', 'Magnesium stearate', 'Lactose monohydrate' etc etc.)
I want to iterate each line of my text file and create groups from one medicine to another.
This should only happen if the medicine name is present at the start of the newline and is not present in between a line.
Expected output:
['Quetiapine fumarate Drug substance This document'],
['Povidone Binder USP'],
['Lactose monohydrate Diluent USNF'],
['Magnesium stearate Lubricant USNF'],
[Lactose monohydrate, CI 77491
0.6
Colourant
E 172],
[Povidone
(0.2
Lubricant
Ph Eur]
Can someone please help me with the same to do this in Python?
Attempt till now:
with open('C:/Users/test1.txt', 'r', encoding='utf8') as file:
data = file.read()
medicines = ('Quetiapine fumarate', 'Povidone', 'Magnesium stearate', 'Lactose monohydrate')
result = []
#with open('C:\Users\substancecopy.csv') as f:
for line in data:
if any(line.startswith(med) for med in medicines):
result.append(line.strip())
I need to capture all the text from one medicine to another as shown in Expected Output which is not happening with this piece of code
You can do it without regular expressions using str.startswith()
:
medicines = ('Quetiapine fumarate', 'Povidone', 'Magnesium stearate', 'Lactose monohydrate')
result = []
with open('C:\Users\substancecopy.csv') as f:
for line in f:
if any(line.startswith(med) for med in medicines):
result.append(line.strip())
I'm not sure why your expected output contains list of lists with single string, but if you really needed use result.append([line.strip()])
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.