如何對 Python 文件夾中的多個文件進行分組？

Question

我有一個包含 30 個 csvs 的文件夾。 除了一個“UNITID”列之外，它們都具有彼此唯一的列。 我希望在所有 csv 的 UNITID 列上做一個 groupby function。

最終，我想要一個 dataframe，每個 UNITID 的所有列都彼此相鄰。

關於我如何做到這一點的任何想法？

提前致謝。

Answer 1

也許您可以一次一個地將數據框合並在一起？ 像這樣的東西：

# get a list of your csv paths somehow
list_of_csvs = get_filenames_of_csvs()

# load the first csv file into a DF to start with
big_df = pd.read_csv(list_of_csvs[0])

# merge to other csvs into the first, one at a time
for csv in list_of_csvs[1:]:
    df = pd.read_csv(csv)
    big_df = big_df.merge(df, how="outer", on="UNITID")

所有 csv 將根據 UNITID 合並在一起，保留所有列的並集。

Answer 2

dustin 解決方案的另一種替代方案是結合 functool 的 reduce function 和 DataFrame.merge()

像這樣，

from functools import reduce # standard library, no need to pip it
from pandas import DataFrame
# make some dfs

df1
   id col_one col_two
0   0       a       d
1   1       b       e
2   2       c       f
df2
   id col_three col_four
0   0         A        D
1   1         B        E
2   2         C        F
df3
   id  col_five  col_six
0   0         1        4
1   1         2        5
2   2         3        6

單線：

reduce(lambda x,y: x.merge(y, on= "id"), [df1, df2, df3])

   id col_one col_two col_three col_four  col_five  col_six
0   0       a       d         A        D         1        4
1   1       b       e         B        E         2        5
2   2       c       f         C        F         3        6

functools.reduce 文檔

pandas.DataFrame.merge 文檔

如何對 Python 文件夾中的多個文件進行分組？

問題描述

2 個解決方案

解決方案1
2 2021-03-30 14:27:33

解決方案2
1 2021-03-30 14:36:44

如何對 Python 文件夾中的多個文件進行分組？

問題描述

2 個解決方案

解決方案1 2 2021-03-30 14:27:33

解決方案2 1 2021-03-30 14:36:44

解決方案1
2 2021-03-30 14:27:33

解決方案2
1 2021-03-30 14:36:44