简体   繁体   English

如何从多个 csv 文件创建数据框?

[英]How to create a dataframe from multiple csv files?

I am loading a csv file in pandas as我正在 Pandas 中加载一个 csv 文件作为

premier10 = pd.read_csv('./premier_league/pl_09_10.csv')

However, I have 20+ csv files, which I was hoping to load as separate dfs (one df per csv) using a loop and predefined names, something similar to:但是,我有 20 多个 csv 文件,我希望使用循环和预定义名称作为单独的 dfs(每个 csv 一个 df)加载,类似于:

import pandas as pd
file_names = ['pl_09_10.csv','pl_10_11.csv']
names = ['premier10','premier11']
for i in range (0,len(file_names)):
     names[i] = pd.read_csv('./premier_league/{}'.format(file_names[i]))

(Note, here I provide only two csv files as example) Unfortunately, this doesn't work (no error messages, but the the pd dfs don't exist). (注意,这里我只提供了两个 csv 文件作为示例)不幸的是,这不起作用(没有错误消息,但 pd dfs 不存在)。

Any tips/links to previous questions would be greatly appreciated as I haven't found anything similar on Stackoverflow.任何提示/以前问题的链接将不胜感激,因为我在 Stackoverflow 上没有发现任何类似的东西。

  1. Use pathlib to set a Path, p , to the files使用pathlib设置文件的路径p
  2. Use the .glob method to find the files matching the pattern使用.glob方法查找与模式匹配的文件
  3. Create a dataframe with pandas.read_csv使用pandas.read_csv创建数据pandas.read_csv
    • Use a dict comprehension to create a dict of dataframes, where each file will have its own key-value pair.使用字典理解来创建数据帧的字典,其中每个文件都有自己的键值对。
      • Use the dict like any other dict;像任何其他字典一样使用字典; the keys are the file names and the values are the dataframes.键是文件名,值是数据帧。
    • Alternatively, use a list comprehension with pandas.concat to create a single dataframe from all the files.或者,使用带有pandas.concat的列表pandas.concat从所有文件创建单个数据帧。
  • In the for-loop in the OP, objects (variables) may not be created in that way (eg names[i] ).在 OP 的for-loop中,可能不会以这种方式创建对象(变量)(例如names[i] )。
    • This is equivalent to 'premier10' = pd.read_csv(...) , where 'premier10' is a str type.这相当于'premier10' = pd.read_csv(...) ,其中'premier10'str类型。
from pathlib import Path
import pandas as pd

# set the path to the files
p = Path('some_path/premier_league')  

# create a list of the files matching the pattern
files = list(p.glob(f'pl_*.csv'))

# creates a dict of dataframes, where each file has a separate dataframe
df_dict = {f.stem: pd.read_csv(f) for f in files}  

# alternative, creates 1 dataframe from all files
df = pd.concat([pd.read_csv(f) for f in files])  

names = ['premier10','premier11'] does not create a dictionary but a list. names = ['premier10','premier11']不会创建字典而是创建列表。 Simply replace it with names = dict() or replace names = ['premier10','premier11'] by names.append(['premier10','premier11'])只需将其替换为names = dict()或将names = ['premier10','premier11'] names.append(['premier10','premier11'])

This is what you want:这就是你想要的:

#create a variable and look through contents of the directory 
files=[f for f in os.listdir("./your_directory") if f.endswith('.csv')]

#Initalize an empty data frame
all_data = pd.DataFrame()

#iterate through files and their contents, then concatenate their data into the data frame initialized above
for file in files:
   df = pd.read_csv('./your_directory' + file)
   all_data = pd.concat([all_data, df])

#Call the new data frame and verify that contents were transferred
all_data.head()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从一个文件夹中的多个csv文件创建一个数据框 - How to create one dataframe from multiple csv files in a folder 如何从指定目录中的多个 csv 文件创建单个 dataframe - How to create a single dataframe from multiple csv files in a specified directory 如何通过从多个内容相似的 csv 文件中导入数据来创建 dataframe? - How to create a dataframe by importing data from multiple .csv files that are alike in contents? 如何将多个 csv 文件中的列连接/加入 1 个 DataFrame()? - How to concat/join columns from multiple csv files into 1 DataFrame()? 从多个文件中读取多个CSV文件到pandas DataFrame中 - Reading multiple CSV files from multiple files into pandas DataFrame 通过使用read_csv()从多个文件中读取数据来创建多级DataFrame [已解决] - Create Multilevel DataFrame by reading in data from multiple files using read_csv() [SOLVED] Python:如何从大熊猫数据帧创建多个 CSV 而不复制已创建的 CSV 中的记录 - Python : How to create multiple CSV from the large pandas dataframe without duplicating the records in CSV's created 从多个 CSV 文件创建 5 个新列 - Create 5 new columns from multiple CSV files PySpark-从数据框创建多个json文件 - PySpark - create multiple json files from dataframe 如何通过读取 python 中各个文件夹的文件内容来创建新的数据框/CSV 文件 - How to create a new dataframe/CSV file from reading contents of files from various folders in python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM