简体   繁体   English

迭代字符串元组和列表列表并将值导出到 Python 中的 csv

[英]Iterate over a tuple of a string and list of lists and export values to a csv in Python

I have a function called 'tableextract' which returns a list of lists.我有一个名为“tableextract”的函数,它返回一个列表列表。

Example of 'tableextract' function output for 1 file called ABC.txt:名为 ABC.txt 的 1 个文件的“tableextract”函数输出示例:

[[' ', 'CITRIC ACID, ANHYDROUS', '5 mg', ''], [' ', 'NITROGEN', '5 mg', '21%'], [' ', 'PURIFIED WATER', '5 mg', '']]

Now, I have a directory of files, on which I want to run this function and export all the values in a single csv.现在,我有一个文件目录,我想在该目录上运行此函数并将所有值导出到单个 csv 中。 And for each file in the directory, based on the number of list of lists generated by the 'tableextract' function for that file, I want a new row in the csv with the filename and each list item inside the list of lists as a separate column.对于目录中的每个文件,根据“tableextract”函数为该文件生成的列表列表的数量,我希望在 csv 中添加一个新行,其中包含文件名和列表列表中的每个列表项作为单独的柱子。

Expected output(0,1,2,3 column values are generated by pandas, when splitting on comma.. sep=',')预期输出(0,1,2,3 列值由 Pandas 生成,当以逗号分割时.. sep=',')

Filename       0               1                         2             3

ABC.txt                 CITRIC ACID, ANHYDROUS         5mg                   

ABC.txt                   NITROGEN                     5 mg          21%

ABC.txt                   PURIFIED WATER               5 mg 

Right now, the code I have written the code:现在,我写的代码:

data = []
    dir ='C:\\Users\\'
    allfiles = os.listdir(dir)
    files =[fnmatch for fnmatch in allfiles if fnmatch.endswith(".txt")]
    for x in files:
        output = tableextract(dir+x)
        newout = x,output
        data.append((newout))
        df = pd.DataFrame(data)

        df.to_csv('./Desktop/newgoofs7.csv', index=False, sep=',')

is giving output in this format in the csv:在 csv 中以这种格式提供输出:

    0903882.txt     [[' ', 'HYPROMELLOSE', '1.765 mg', '1.765 %'], [' ', 'PE', '10.000 mg', '10.000 %'], [' ', 'RAMIPRIL', '10 mg', ''], [' ', 'RAMIPRIL', '10.000 mg', '10.000 %'], [' ', 'SODIUM STEARYL FUMARATE', '0.250 mg', '0.250 %']]

    0903777.txt     [[' ', 'HYPROMELLOSE', '0.441 mg', '0.441 %'], [' ', 'PE', '2.500 mg', '2.500 %'], [' ', 'RAMIPRIL', '2.5 mg', ''], [' ', 'RAMIPRIL', '2.500 mg', '2.500 %'], [' ', 'YELLOW FERRIC OXIDE', '0.100 mg', '0.100 %']]

where one column has the filename and the entire output of 'tablextract' function is in one column.其中一列具有文件名,'tablextract' 函数的整个输出位于一列中。

I want the output of my code to provide the output mentioned in the 'Expected output :' format.我希望我的代码输出提供“预期输出:”格式中提到的输出。

Any help on this?这有什么帮助吗?

You must put "df = pd.DataFrame(data)" and "df.to_csv(...)" after the "for" loop.您必须在“for”循环之后放置“df = pd.DataFrame(data)”和“df.to_csv(...)”。 Here is my solution:这是我的解决方案:

# testdata
data=[
    ("ABC.txt", [[' ', 'CITRIC ACID, ANHYDROUS', '5 mg', ''], [' ', 'NITROGEN', '5 mg', '21%'], [' ', 'PURIFIED WATER', '5 mg', '']]),
    ("DEF.txt", [[' ', 'citric acid, anhydrous', '4 mg', ''], [' ', 'nitrogen', '8 mg', '1%'], [' ', 'purified water', '9 mg', '']])
  ]

#-----------------------
df= pd.DataFrame(data)
df2= df.explode(1)[1].str.join("|").str.split("|",expand=True)
cols= df2.columns.tolist()
df2["Filename"]= df[0]
df2= df2.reindex(["Filename"]+cols, axis=1)

df2:

  Filename  0                       1     2    3
0  ABC.txt     CITRIC ACID, ANHYDROUS  5 mg     
0  ABC.txt                   NITROGEN  5 mg  21%
0  ABC.txt             PURIFIED WATER  5 mg     
1  DEF.txt     citric acid, anhydrous  4 mg     
1  DEF.txt                   nitrogen  8 mg   1%
1  DEF.txt             purified water  9 mg  

EDIT: Another solution:编辑:另一个解决方案:

for x in files:
    data.append( tableextract(dir+x) )

dd={"Filename": [ fn for fn,dat in data for i in range(len(dat)) ] }
z= zip(*[d for fn,dat in data for d in dat])
dd.update({i:a for i,a in enumerate(z)})
df2=pd.DataFrame(dd)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM