简体   繁体   English

如何按列值将 Pandas 数据帧拆分/切片为多个数据帧?

[英]How to split/slice a Pandas dataframe into multiple dataframes by column value?

I have the following Pandas DF:我有以下熊猫 DF:

0 | A
1 | B
2 | A
3 | A
4 | B
5 | A
6 | B
7 | B
8 | A
9 | A

I want to slice this single DF into multiple ones along B values so that I omit all Bs and got all the consecutive A rows as the respective resulting DFs like this:我想将这个单个 DF 沿 B 值分割成多个,这样我就省略了所有 B 并将所有连续的 A 行作为相应的结果 DF,如下所示:

df#1: df#1:

0 | A

df#2: df#2:

2 | A
3 | A

df#3: df#3:

5 | A

df#4: df#4:

8 | A
9 | A

Order of A-rows must be kept.必须保持 A 行的顺序。 How to perform this action?如何执行此操作? (The actual task is a time series of relevant events that must be handled as one event separated by irrelevant events.) (实际任务是相关事件的时间序列,必须作为一个事件处理,由不相关事件分隔。)

You can use itertools.groupby to filter out the portions you want -您可以使用itertools.groupby过滤掉您想要的部分 -

from itertools import groupby
dfs = [pd.DataFrame.from_records(list(g), 
       columns=df.reset_index().columns, index='index')
       for k, g in 
       groupby(df.to_records(), key=lambda x: x[2])
       if k.strip() == 'A']
df_1, df_2, df_3, df_4 = dfs #This is probably not necessary

Output输出

# df_1
       0   1
index       
0      0   A

# df_2
       0   1
index       
2      2   A
3      3   A

You can create loop by consecutive A values created by mask for compare column col with groups created by chain mask with shifted inverted values with cumulative sum :您可以通过掩码创建的连续A值创建循环,以便将列col与链掩码创建的组进行比较,并使用累积sum移位反转值:

m = df.col.eq('A')

for i, g in df[m].groupby((m & ~m.shift(fill_value=False)).cumsum()):
    print (g)
  col
0   A
  col
2   A
3   A
  col
5   A
  col
8   A
9   A

or dictonary of DataFrames:或 DataFrames 的字典:

m = df.col.eq('A')
d = {i: g for i, g in df[m].groupby((m & ~m.shift(fill_value=False)).cumsum())}
    
print (d)
{1:   col
0   A, 2:   col
2   A
3   A, 3:   col
5   A, 4:   col
8   A
9   A}

print (d[1])
  col
0   A

Another similar idea:另一个类似的想法:

m = df.col.eq('A')
d = {i: g for i, g in df[m].groupby(m.ne(m.shift()).cumsum())}
    
print (d)
{1:   col
0   A, 3:   col
2   A
3   A, 5:   col
5   A, 7:   col
8   A
9   A}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据列的值将 Pandas dataframe 拆分为多个数据帧 - Split a Pandas dataframe into multiple dataframes based on the value of a column 找到特定的列值后如何将一个数据帧拆分为多个数据帧 - How to split a dataframe in multiple dataframes after a specific column value is found 如何从单个 dataframe 切片和创建多个 pandas 数据帧 - how to slice and create multiple pandas dataframes from a singe dataframe 将pandas数据框拆分为多个数据框 - Split pandas dataframe into multiple dataframes 将 Pandas 数据帧行拆分为搜索的列值到新的数据帧中 - Split pandas dataframe rows up to searched column value into new dataframes 如何将不同列大小的 pandas dataframe 拆分为单独的数据帧? - How to split a pandas dataframe of different column sizes into separate dataframes? 如何在 pandas dataframe 中拆分字符串,并返回多个数据帧 - How to split a string in a pandas dataframe, and return multiple dataframes 如何通过列值的范围将单个数据帧拆分为多个数据帧? - How do I split a single dataframe into multiple dataframes by the range of a column value? 熊猫-根据日期将数据框拆分为多个数据框? - Pandas - Split dataframe into multiple dataframes based on dates? 通过查找 NaN 将 pandas 数据帧拆分为多个数据帧 - Split pandas dataframe into multiple dataframes by looking for NaN
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM