将 pandas dataframe 拆分为多个数据帧，列表列表作为掩码

Question

I have a pandas dataframe tat looks something like this我有一个 pandas dataframe 看起来像这样

A BB
1 foo.bar
2 foo.bar
3 foo.foo
4 foo.bar
5 foo.bar
6 foo.foo

I basically expect to get two dataframes out of them based on this list of lists:我基本上希望根据这个列表列表从中得到两个数据帧：

[[False, False, True], [False, False, True]]

OUTPUT should be: OUTPUT 应该是：

df1: df1:

A BB
1 foo.bar
2 foo.bar
3 foo.foo

df2 DF2

A BB
4 foo.bar
5 foo.bar
6 foo.foo

Answer 1

You can你可以

get the rows where df.BB equals 'foo.foo'获取df.BB等于'foo.foo'的行
shift that by one row将其移动一行
apply cumulative sum to that and对其应用累计和
group by the resulting indices.按结果指数分组。

You end up with a groupby object that you can turn into a list of sub-dfs.您最终得到一个groupby object，您可以将其转换为子 df 列表。

>>> groups = df.groupby(df.BB.eq('foo.foo').shift(fill_value=0).cumsum())
>>> frames = [frame for _, frame in groups]
>>> frames # list of sub-dfs
[   A       BB
 0  1  foo.bar
 1  2  foo.bar
 2  3  foo.foo,
    A       BB
 3  4  foo.bar
 4  5  foo.bar
 5  6  foo.foo]

Answer 2

Numpy: Numpy：

flatnonzero to find where the 'foo.foo' rows are flatnonzero查找'foo.foo'行的位置
split to divide the dataframe up accordingly split相应地划分 dataframe

import numpy as np

np.split(df, np.flatnonzero(df.BB.eq('foo.foo'))[:-1] + 1)

[   A       BB
 0  1  foo.bar
 1  2  foo.bar
 2  3  foo.foo,
    A       BB
 3  4  foo.bar
 4  5  foo.bar
 5  6  foo.foo]

Addressing @mozway's comment针对@mozway 的评论

list(filter(
    lambda d: not d.empty,
    np.split(df, np.flatnonzero(df.BB.eq('foo.foo')) + 1)
))

[   A       BB
 0  1  foo.bar
 1  2  foo.bar
 2  3  foo.foo,
    A       BB
 3  4  foo.bar
 4  5  foo.bar
 5  6  foo.foo]

Answer 3

Is it what you expect:是不是如你所愿：

m = len(df) // 2
df1, df2 = df.iloc[:m], df.iloc[m:]

Output: Output：

>>> df1
   A       BB
0  1  foo.bar
1  2  foo.bar
2  3  foo.foo

>>> df2
   A       BB
3  4  foo.bar
4  5  foo.bar
5  6  foo.foo

Or use np.split或者使用np.split

df1, df2 = np.split(df, 2)

将 pandas dataframe 拆分为多个数据帧，列表列表作为掩码

问题描述

3 个解决方案

解决方案1
2 2022-03-31 19:30:36

解决方案2
2 已采纳 2022-03-31 19:45:34

解决方案3
1 2022-03-31 19:28:38

将 pandas dataframe 拆分为多个数据帧，列表列表作为掩码

问题描述

3 个解决方案

解决方案1 2 2022-03-31 19:30:36

解决方案2 2 已采纳 2022-03-31 19:45:34

解决方案3 1 2022-03-31 19:28:38

解决方案1
2 2022-03-31 19:30:36

解决方案2
2 已采纳 2022-03-31 19:45:34

解决方案3
1 2022-03-31 19:28:38