简体   繁体   English

读取 python 中的多个 excel 文件

[英]Reading multiple excel file in python

Sample data image I'm new to python.示例数据图像我是 python 的新手。 Trying to read multiple excel files in folder and make it separate DataFrames.尝试读取文件夹中的多个 excel 文件并将其设置为单独的 DataFrame。

Is the below code correct?下面的代码正确吗?

import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sys, os
import matplotlib.pyplot as plt 
from sklearn.model_selection import train_test_split
os.chdir(r'/Users/try/Documents/data')

df = ([])
def readdataframe(the_list):
    for element in the_list:
        print(element)
        df[element] = pd.read_excel(element, 'shee1')

readdataframe(["24032020_D_KWH.xlsx","25032020_D_KWH.xlsx","26032020_D_KWH.xlsx","27032020_D_KWH.xlsx"])

I get below error when I execute执行时出现以下错误

TypeError: list indices must be integers or slices, not str TypeError:列表索引必须是整数或切片,而不是 str

Changing df = ([]) with df=pd.DataFrame() should do the trick.用 df=pd.DataFrame() 更改 df = ([]) 应该可以解决问题。 You didn't define your df as a pandas dataframe.您没有将 df 定义为 pandas dataframe。

After a test this is what I came up with:经过测试,这是我想出的:

import pandas as pd
import os

os.chdir(r"path to your excel files")

the_list = []

for root, dirs, files in os.walk(r"path to your excel files"):
    for file in files:
        if file.endswith('.xlsx'):
            the_list.append(file)

def readdataframe(the_list):
    df = pd.DataFrame() #define df as an empty pandas DataFrame
    for element in the_list:
        #print(element)
        df[element] = pd.read_excel(element, squeeze=True)
    return df

print(readdataframe(the_list))

Output: Output:

   file1.xlsx  file2.xlsx  file3.xlsx
0           1           6          11
1           2           7          12
2           3           8          13
3           4           9          14
4           5          10          15

I'm sorry but it is considered a bad practice to upload your files, and I'm not going to download it.很抱歉,上传文件被认为是一种不好的做法,我不会下载它。 Nothing personal, just basic digital hygiene.没有什么私人的,只是基本的数字卫生。

Now onto the explanation.现在进入解释。 As you may have noticed in this line正如您可能在这一行中注意到的那样

df[element] = pd.read_excel(element, squeeze=True)

I"ve added我已经添加

squeeze=True挤压=真

What this parameter does is convert the dataframe that was returned into a pandas Series (one-dimensional array, think of it as your general python list), because I had only 1 column in each of my files and此参数所做的是将返回的 dataframe 转换为 pandas 系列(一维数组,将其视为您的一般 python 列表),因为我的每个文件和列表中只有 1 列

df[element] = df[元素] =

syntax sets "element" as a column name in your dataframe where you save your data.语法将“元素”设置为 dataframe 中保存数据的列名。 So, this way will only work if data in your file is one-dimensional (only 1 column) If not, you should probably look into pandas.concat or pandas DataFrame join depending on the uniformity of the data shape in the files and your needs.因此,这种方法仅在您的文件中的数据是一维(只有 1 列) 时才有效。 .

if you are willing to get multiple dataframes instead this is what I'm suggesting.如果您愿意获得多个数据框,这就是我的建议。

import pandas as pd
import os

os.chdir(r"path to your excel files")

the_list = []

for root, dirs, files in os.walk(r"path to your excel files"):
    for file in files:
        if file.endswith('.xlsx'):
            the_list.append(file)

def readdataframe(the_list):
    df_dict = {}
    for element in the_list:
        df_dict[element] = pd.read_excel(element)
    return df_dict

print(readdataframe(the_list))

This way you are getting a python dictionary(hashtable) that contains your dataframe objects as a value and filename as a key.这样,您将获得一个 python 字典(哈希表),其中包含您的 dataframe 对象作为值和文件名作为键。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM