[英]How to split/slice a Pandas dataframe into multiple dataframes by column value?
I have the following Pandas DF:我有以下熊猫 DF:
0 | A
1 | B
2 | A
3 | A
4 | B
5 | A
6 | B
7 | B
8 | A
9 | A
I want to slice this single DF into multiple ones along B values so that I omit all Bs and got all the consecutive A rows as the respective resulting DFs like this:我想将这个单个 DF 沿 B 值分割成多个,这样我就省略了所有 B 并将所有连续的 A 行作为相应的结果 DF,如下所示:
df#1: df#1:
0 | A
df#2: df#2:
2 | A
3 | A
df#3: df#3:
5 | A
df#4: df#4:
8 | A
9 | A
Order of A-rows must be kept.必须保持 A 行的顺序。 How to perform this action?
如何执行此操作? (The actual task is a time series of relevant events that must be handled as one event separated by irrelevant events.)
(实际任务是相关事件的时间序列,必须作为一个事件处理,由不相关事件分隔。)
You can use itertools.groupby
to filter out the portions you want -您可以使用
itertools.groupby
过滤掉您想要的部分 -
from itertools import groupby
dfs = [pd.DataFrame.from_records(list(g),
columns=df.reset_index().columns, index='index')
for k, g in
groupby(df.to_records(), key=lambda x: x[2])
if k.strip() == 'A']
df_1, df_2, df_3, df_4 = dfs #This is probably not necessary
Output输出
# df_1
0 1
index
0 0 A
# df_2
0 1
index
2 2 A
3 3 A
You can create loop by consecutive A
values created by mask for compare column col
with groups created by chain mask with shifted inverted values with cumulative sum
:您可以通过掩码创建的连续
A
值创建循环,以便将列col
与链掩码创建的组进行比较,并使用累积sum
移位反转值:
m = df.col.eq('A')
for i, g in df[m].groupby((m & ~m.shift(fill_value=False)).cumsum()):
print (g)
col
0 A
col
2 A
3 A
col
5 A
col
8 A
9 A
or dictonary of DataFrames:或 DataFrames 的字典:
m = df.col.eq('A')
d = {i: g for i, g in df[m].groupby((m & ~m.shift(fill_value=False)).cumsum())}
print (d)
{1: col
0 A, 2: col
2 A
3 A, 3: col
5 A, 4: col
8 A
9 A}
print (d[1])
col
0 A
Another similar idea:另一个类似的想法:
m = df.col.eq('A')
d = {i: g for i, g in df[m].groupby(m.ne(m.shift()).cumsum())}
print (d)
{1: col
0 A, 3: col
2 A
3 A, 5: col
5 A, 7: col
8 A
9 A}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.