简体   繁体   中英

creating pandas data frame from multiple files

I am trying to create a pandas DataFrame<\/code> and it works fine for a single file. If I need to build it for multiple files which have the same data structure. So instead of single file name I have a list of file names from which I would like to create the DataFrame<\/code> .

The pandas concat command is your friend here. Lets say you have all you files in a directory, targetdir. You can:

  1. make a list of the files
  2. load them as pandas dataframes
  3. and concatenate them together

`

import os
import pandas as pd

#list the files
filelist = os.listdir(targetdir) 
#read them into pandas
df_list = [pd.read_table(file) for file in filelist]
#concatenate them together
big_df = pd.concat(df_list)

Potentially horribly inefficient but...

Why not use read_csv , to build two (or more) dataframes, then use join to put them together?

That said, it would be easier to answer your question if you provide some data or some of the code you've used thus far.

I might try to concatenate the files before feeding them to pandas. If you're in Linux or Mac you could use cat , otherwise a very simple Python function could do the job for you.

Are these files in a csv format. You could use the read_csv. http://pandas.sourceforge.net/io.html

Once you have read the files and save it in two dataframes, you could merge the two dataframes or add additional columns to one of the two dataframes( assuming common index). Pandas should be able to fill in missing rows.

import os
import pandas as pd
data = []

thisdir = os.getcwd()

for r, d, f in os.walk(thisdir):
    for file in f:
        if ".docx" in file:
            data.append(file)

df = pd.DataFrame(data)

Here is a simple solution that avoids using a list to hold all the data frames, if you don't need them in a list, it creates a dataframe for each file, you can then pd.concat<\/code> them.

import fnmatch

# get the CSV files only
files = fnmatch.filter(os.listdir('.'), '*.csv')
files

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM