Iterating through files in a directory and writing results into new rows of a dataframe with pandas

Question

I want to iterate files from a directory, extract some information and write it to an excel sheet using pandas. I have this code, but only works when I iterate through one file (without the loop), when I use the loop and try to iterate all the files the output is an empty excel sheet.

import re
import os
import pandas as pd
files=[i for i in os.listdir("path") if i.endswith("txt")]
for file in files:
    f=open((file), 'r')
    data=f.read()
    a=re.findall(r'Company Name(.*?)Type',data,re.DOTALL)
    a1="".join(a).replace('\n',' ')
    b=re.findall(r'Sector(.*?)Sub Sector',data,re.DOTALL)
    b1="".join(b).replace('\n',' ')
    w={'Company Name': [a1], 'Sector': [b1]}
    df=pd.DataFrame(data=w)
    print (os.path.join(file))
df.to_excel(r'/Users/nameuser/info.xlsx')

I see that it iterates through all the files but this way the output is empty.

How can I do it so that all the info that I fet from each file accumulates and is stored into a new row of the excel file?

Answer 1

import re
import os
import pandas as pd

files=[i for i in os.listdir("path") if i.endswith("txt")]

w={'Company Name': [], 'Sector': []}

for file in files:

    f=open((file), 'r')
    data=f.read()
    a=re.findall(r'Company Name(.*?)Type',data,re.DOTALL)
    a1="".join(a).replace('\n',' ')
    b=re.findall(r'Sector(.*?)Sub Sector',data,re.DOTALL)
    b1="".join(b).replace('\n',' ')
    w['Company Name'].append(a1) 
    w['Sector'].append(b1)

    print (os.path.join(file))

df=pd.DataFrame(data=w)
df.to_excel(r'/Users/nameuser/info.xlsx')

This way you would populate all the data as a dict, then converting it to a DataFrame.

Iterating through files in a directory and writing results into new rows of a dataframe with pandas

Question

1 answers

solution1
1 ACCPTED 2019-06-10 21:20:55

Iterating through files in a directory and writing results into new rows of a dataframe with pandas

Question

1 answers

solution1 1 ACCPTED 2019-06-10 21:20:55

solution1
1 ACCPTED 2019-06-10 21:20:55