简体   繁体   English

如何对 Python 文件夹中的多个文件进行分组?

[英]How can I groupby over multiple files in a folder in Python?

I have a folder with 30 csvs.我有一个包含 30 个 csvs 的文件夹。 All of them have unique columns from one another with the exception of a single "UNITID" column.除了一个“UNITID”列之外,它们都具有彼此唯一的列。 I'm looking to do a groupby function on that UNITID column across all the csvs.我希望在所有 csv 的 UNITID 列上做一个 groupby function。

Ultimately I want a single dataframe with all the columns next to each other for each UNITID.最终,我想要一个 dataframe,每个 UNITID 的所有列都彼此相邻。

Any thoughts on how I can do that?关于我如何做到这一点的任何想法?

Thanks in advance.提前致谢。

Perhaps you could merge the dataframes together, one at a time?也许您可以一次一个地将数据框合并在一起? Something like this:像这样的东西:

# get a list of your csv paths somehow
list_of_csvs = get_filenames_of_csvs()

# load the first csv file into a DF to start with
big_df = pd.read_csv(list_of_csvs[0])

# merge to other csvs into the first, one at a time
for csv in list_of_csvs[1:]:
    df = pd.read_csv(csv)
    big_df = big_df.merge(df, how="outer", on="UNITID")

All the csvs will be merged together based on UNITID, preserving the union of all columns.所有 csv 将根据 UNITID 合并在一起,保留所有列的并集。

An alternative one-liner to dustin's solution would be the combination of the functool's reduce function and DataFrame.merge() dustin 解决方案的另一种替代方案是结合 functool 的 reduce function 和 DataFrame.merge()

like so,像这样,

from functools import reduce # standard library, no need to pip it
from pandas import DataFrame
# make some dfs

df1
   id col_one col_two
0   0       a       d
1   1       b       e
2   2       c       f
df2
   id col_three col_four
0   0         A        D
1   1         B        E
2   2         C        F
df3
   id  col_five  col_six
0   0         1        4
1   1         2        5
2   2         3        6

The one-liner:单线:

reduce(lambda x,y: x.merge(y, on= "id"), [df1, df2, df3])

   id col_one col_two col_three col_four  col_five  col_six
0   0       a       d         A        D         1        4
1   1       b       e         B        E         2        5
2   2       c       f         C        F         3        6

functools.reduce docs functools.reduce 文档

pandas.DataFrame.merge docs pandas.DataFrame.merge 文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 对文件夹中的多个文件运行Python脚本 - Run Python script over multiple files in a folder 如何在 python 中的多列 groupBy 上迭代 pandas dataframe - how to iterate over pandas dataframe over multiple column groupBy in python 如何将多个文件夹层次结构中的所有 python 文件导入单个 pyinstaller 可执行文件? - How can I import all my python files in multiple folder hierarchy into a single pyinstaller executable? 如何使用 pandas 编写 python 脚本来迭代具有多张工作表的 Excel.xlsx 文件? - How can I write a python scripts using pandas to iterate over Excel .xlsx files with multiple sheets? 如何将文件从一个文件夹复制到多个文件夹 - How can I copy files from a folder into multiple folders 如何在文件夹结构中读取多个 json 文件? - How can I read multiple in json files in in folder structure? 如何使用 python 保存多个文件? - How can I save multiple files with python? 如何从 python 中的文件文件夹中获取列表列表? - How can I get a list of lists out a folder of files in python? 如何找到文件,然后在Python中循环遍历它们? - How can I find files and then loop over them in Python? 我怎样才能在 dataframe 中为 groupby 做 Python - How can I do Python for groupby in dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM