简体   繁体   English

将 pandas dataframe 拆分为多个 csv 文件,将组保持在一起

[英]Split a pandas dataframe in multiple csv files keeping groups together

I have a 1M lines dataframe which consists of the following: I do not really know the list of keys specifically so I cannot say, filter after Key ABCDE.我有一个 1M 行 dataframe 包含以下内容:我真的不知道具体的键列表,所以我不能说,在键 ABCDE 之后过滤。

Key;Value核心价值

A;1
A;2
B;3
B;4
B;5
C;6
C;7
D;8

I would like to split this one into 3 dataframes keeping the groups together So the result should be我想将这个分成 3 个数据框,将组保持在一起所以结果应该是

A;1
A;2

B;3
B;4
B;5
C;6
C;7

D;8

So I would like to split it but it needs to keep the groups together, you cannot do this for example所以我想拆分它,但它需要将组保持在一起,例如你不能这样做

A;1
A;2
B;3

B;4
B;5
C;6

C;7
D;8

So I would like to split it after a group is finished.所以我想在一个组完成后拆分它。

I tried a bit with the pandas groupby function but no really sure how to do the split when you have all the groups together.我对 pandas groupby function 进行了一些尝试,但是当您将所有组放在一起时,我不确定如何进行拆分。

The split can be quite random eg.分裂可以是非常随机的,例如。 every 1K lines It does not really matter its more that the groups need to be together.每 1K 行 组需要在一起并不重要。

There are a few ways, using groupby for example.有几种方法,例如使用 groupby。 Here's one way.这是一种方法。

import pandas as pd

df = pd.DataFrame({"key":["A","A","B","B","C","C","D","D"],
                   "value":[1,2,3,4,5,6,7,8]})

df.loc[df["key"] == "A", :].to_csv("filename_A.csv")
df.loc[(df["key"] == "B") | (df["key"] == "C"), :].to_csv("filename_BC.csv")
df.loc[df["key"] == "D", :].to_csv("filename_D.csv")

You can use numpy.array_split for split by groups and them select values in Series.isin and boolean indexing :您可以使用numpy.array_split按组拆分,并在Series.isinboolean indexing中使用 select 值:

N = 3
for i, key in enumerate(np.array_split(df['Key'].unique(), N)):
    print (key)
    df[df['Key'].isin(key)].to_csv(f'file{i}.csv', index=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM