How to create a dataframe from multiple csv files?

Question

I am loading a csv file in pandas as

premier10 = pd.read_csv('./premier_league/pl_09_10.csv')

However, I have 20+ csv files, which I was hoping to load as separate dfs (one df per csv) using a loop and predefined names, something similar to:

import pandas as pd
file_names = ['pl_09_10.csv','pl_10_11.csv']
names = ['premier10','premier11']
for i in range (0,len(file_names)):
     names[i] = pd.read_csv('./premier_league/{}'.format(file_names[i]))

(Note, here I provide only two csv files as example) Unfortunately, this doesn't work (no error messages, but the the pd dfs don't exist).

Any tips/links to previous questions would be greatly appreciated as I haven't found anything similar on Stackoverflow.

Answer 1

Use pathlib to set a Path, p , to the files
Use the .glob method to find the files matching the pattern
Create a dataframe with pandas.read_csv
- Use a dict comprehension to create a dict of dataframes, where each file will have its own key-value pair.
  - Use the dict like any other dict; the keys are the file names and the values are the dataframes.
- Alternatively, use a list comprehension with pandas.concat to create a single dataframe from all the files.

In the for-loop in the OP, objects (variables) may not be created in that way (eg names[i] ).
- This is equivalent to 'premier10' = pd.read_csv(...) , where 'premier10' is a str type.

from pathlib import Path
import pandas as pd

# set the path to the files
p = Path('some_path/premier_league')  

# create a list of the files matching the pattern
files = list(p.glob(f'pl_*.csv'))

# creates a dict of dataframes, where each file has a separate dataframe
df_dict = {f.stem: pd.read_csv(f) for f in files}  

# alternative, creates 1 dataframe from all files
df = pd.concat([pd.read_csv(f) for f in files])

Answer 2

names = ['premier10','premier11'] does not create a dictionary but a list. Simply replace it with names = dict() or replace names = ['premier10','premier11'] by names.append(['premier10','premier11'])

Answer 3

This is what you want:

#create a variable and look through contents of the directory 
files=[f for f in os.listdir("./your_directory") if f.endswith('.csv')]

#Initalize an empty data frame
all_data = pd.DataFrame()

#iterate through files and their contents, then concatenate their data into the data frame initialized above
for file in files:
   df = pd.read_csv('./your_directory' + file)
   all_data = pd.concat([all_data, df])

#Call the new data frame and verify that contents were transferred
all_data.head()

How to create a dataframe from multiple csv files?

Question

3 answers

solution1
2 ACCPTED 2020-09-14 15:06:18

solution2
0 2020-09-14 14:59:56

solution3
0 2020-09-14 16:34:02

How to create a dataframe from multiple csv files?

Question

3 answers

solution1 2 ACCPTED 2020-09-14 15:06:18

solution2 0 2020-09-14 14:59:56

solution3 0 2020-09-14 16:34:02

solution1
2 ACCPTED 2020-09-14 15:06:18

solution2
0 2020-09-14 14:59:56

solution3
0 2020-09-14 16:34:02