[英]How to break a pandas dataframe into sub dataframes when a certain value is found in the dataframe column?
I have dataframe that looks like this:我有 dataframe 看起来像这样:
data = pd.DataFrame({"event": ["A", "B", "C", "A", "A", "E", "P", "S", "A", "Y", "A"]})
data.head(15)
event
0 A
1 B
2 C
3 A
4 A
5 E
6 P
7 S
8 A
9 Y
10 A
I want to break this dataframe into 5 small dataframes whenever the event "A" is found.每当发现事件“A”时,我想将这个 dataframe 分成 5 个小数据帧。 So the five dataframes I want to create, would look like this in the case:
所以我想创建的五个数据框在这种情况下看起来像这样:
1) event
0 A
1 B
2 C
2) event
0 A
3) event
0 A
1 E
2 P
3 S
4) event
0 A
1 Y
5) event
0 A
Is there any elegant way to do this with Python Pandas and also Pyspark?有什么优雅的方法可以用 Python Pandas 和 Pyspark 做到这一点吗?
With pandas, use groupby
with a helper grouper using data['event'].eq('A').cumsum()
:对于 pandas,使用
data['event'].eq('A').cumsum()
将groupby
与辅助石斑鱼一起使用:
dfs = [g for _,g in data.groupby(data['event'].eq('A').cumsum())]
or to get a new index, add a reset_index
:或者要获取新索引,请添加
reset_index
:
dfs = [g.reset_index(drop=True)
for _,g in data.groupby(data['event'].eq('A').cumsum())]
output (without reset_index
): output(没有
reset_index
):
[ event
0 A
1 B
2 C,
event
3 A,
event
4 A
5 E
6 P
7 S,
event
8 A
9 Y,
event
10 A]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.