簡體   English   中英

Pandas 讀取文件夾中的 excel 文件並將列取消透視到 Dataframe

[英]Pandas read excel files in folder and Unpivot columns into Dataframe

我在具有不同列名和數據類型的文件夾中有大約(100 個文件 +)XLSX 文件

文件 1:

Id  test  category
1   ab      4
2   cs      3
3   cs      1

文件 2:

index  remove  stocks  category
1      dr      4         a
2      as      3         b
3      ae      1         v

文件 3:……

文件 4……

這是我基於另一個例子的嘗試:

    #  current directory (including python script & all excel files)
    mydir = (os.getcwd()).replace('\\','/') + '/'
    
    #Get all excel files include subdir
    filelist=[]
    for path, subdirs, files in os.walk(mydir):
        for file in files:
            if (file.endswith('.xlsx') or file.endswith('.xls') or file.endswith('.XLS')):
                filelist.append(os.path.join(path, file))
    number_of_files=len(filelist)
    print(filelist)

# Read all excel files and save to dataframe (df[0] - df[x]),
# x is the number of excel files that have been read - 1


df=[]
for i in range(number_of_files):
    try:
        df.melt(pd.read_excel(r''+filelist[i]))
    except:
        print('Empty Excel File')
print(df)

結果:

Empty Excel File
Empty Excel File
Empty Excel File
Empty Excel File
[]

我怎樣才能取消數據透視而不是“附加”列中的數據?

我想將我的所有文件數據轉為這種數據框格式。

數據框:

Id    1
Id    2
Id    3
test  ab
test  cs
test  cs
category 4
category 3
category 1
index    1
index    1
index    1
remove   dr
remove   as
remove   ae
stocks   4
stocks   3
stocks   1
category a
category b
category v

我已經用您的示例輸入對其進行了測試:

one={"Id": [1,2,3], "test": ["ab","cs","cs"],  "category": [4,3,1]}
two= {"index": [1,2,3],  "remove": ["dr","as","ae"],  "stocks": [4,3,1],  "category": ["a", "b", "v"]}
df1 = pd.DataFrame(one)
df2 = pd.DataFrame(two)
final = pd.concat([df1.melt(),df2.melt()])
final:
    variable value
0         Id     1
1         Id     2
2         Id     3
3       test    ab
4       test    cs
5       test    cs
6   category     4
7   category     3
8   category     1
0      index     1
1      index     2
2      index     3
3     remove    dr
4     remove    as
5     remove    ae
6     stocks     4
7     stocks     3
8     stocks     1
9   category     a
10  category     b
11  category     v

你可以使用:

import pandas as pd
import pathlib

data = []
for filename in pathlib.Path.cwd().iterdir():
    if filename.suffix.lower().startswith('.xls'):
        data.append(pd.read_excel(filename).melt())
df = pd.concat(data, ignore_index=True)

輸出:

>>> df
     variable value
0          Id     1
1          Id     2
2          Id     3
3        test    ab
4        test    cs
5        test    cs
6    category     4
7    category     3
8    category     1
9       index     1
10      index     2
11      index     3
12     remove    dr
13     remove    as
14     remove    ae
15     stocks     4
16     stocks     3
17     stocks     1
18   category     a
19   category     b
20   category     v

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM