I have a function called 'tableextract' which returns a list of lists.
Example of 'tableextract' function output for 1 file called ABC.txt:
[[' ', 'CITRIC ACID, ANHYDROUS', '5 mg', ''], [' ', 'NITROGEN', '5 mg', '21%'], [' ', 'PURIFIED WATER', '5 mg', '']]
Now, I have a directory of files, on which I want to run this function and export all the values in a single csv. And for each file in the directory, based on the number of list of lists generated by the 'tableextract' function for that file, I want a new row in the csv with the filename and each list item inside the list of lists as a separate column.
Expected output(0,1,2,3 column values are generated by pandas, when splitting on comma.. sep=',')
Filename 0 1 2 3
ABC.txt CITRIC ACID, ANHYDROUS 5mg
ABC.txt NITROGEN 5 mg 21%
ABC.txt PURIFIED WATER 5 mg
Right now, the code I have written the code:
data = []
dir ='C:\\Users\\'
allfiles = os.listdir(dir)
files =[fnmatch for fnmatch in allfiles if fnmatch.endswith(".txt")]
for x in files:
output = tableextract(dir+x)
newout = x,output
data.append((newout))
df = pd.DataFrame(data)
df.to_csv('./Desktop/newgoofs7.csv', index=False, sep=',')
is giving output in this format in the csv:
0903882.txt [[' ', 'HYPROMELLOSE', '1.765 mg', '1.765 %'], [' ', 'PE', '10.000 mg', '10.000 %'], [' ', 'RAMIPRIL', '10 mg', ''], [' ', 'RAMIPRIL', '10.000 mg', '10.000 %'], [' ', 'SODIUM STEARYL FUMARATE', '0.250 mg', '0.250 %']]
0903777.txt [[' ', 'HYPROMELLOSE', '0.441 mg', '0.441 %'], [' ', 'PE', '2.500 mg', '2.500 %'], [' ', 'RAMIPRIL', '2.5 mg', ''], [' ', 'RAMIPRIL', '2.500 mg', '2.500 %'], [' ', 'YELLOW FERRIC OXIDE', '0.100 mg', '0.100 %']]
where one column has the filename and the entire output of 'tablextract' function is in one column.
I want the output of my code to provide the output mentioned in the 'Expected output :' format.
Any help on this?
You must put "df = pd.DataFrame(data)" and "df.to_csv(...)" after the "for" loop. Here is my solution:
# testdata
data=[
("ABC.txt", [[' ', 'CITRIC ACID, ANHYDROUS', '5 mg', ''], [' ', 'NITROGEN', '5 mg', '21%'], [' ', 'PURIFIED WATER', '5 mg', '']]),
("DEF.txt", [[' ', 'citric acid, anhydrous', '4 mg', ''], [' ', 'nitrogen', '8 mg', '1%'], [' ', 'purified water', '9 mg', '']])
]
#-----------------------
df= pd.DataFrame(data)
df2= df.explode(1)[1].str.join("|").str.split("|",expand=True)
cols= df2.columns.tolist()
df2["Filename"]= df[0]
df2= df2.reindex(["Filename"]+cols, axis=1)
df2:
Filename 0 1 2 3
0 ABC.txt CITRIC ACID, ANHYDROUS 5 mg
0 ABC.txt NITROGEN 5 mg 21%
0 ABC.txt PURIFIED WATER 5 mg
1 DEF.txt citric acid, anhydrous 4 mg
1 DEF.txt nitrogen 8 mg 1%
1 DEF.txt purified water 9 mg
EDIT: Another solution:
for x in files:
data.append( tableextract(dir+x) )
dd={"Filename": [ fn for fn,dat in data for i in range(len(dat)) ] }
z= zip(*[d for fn,dat in data for d in dat])
dd.update({i:a for i,a in enumerate(z)})
df2=pd.DataFrame(dd)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.