简体   繁体   English

将 pandas dataframe 拆分为多个数据帧,列表列表作为掩码

[英]Split pandas dataframe into multiple dataframes with list of lists as mask

I have a pandas dataframe tat looks something like this我有一个 pandas dataframe 看起来像这样

A BB
1 foo.bar
2 foo.bar
3 foo.foo
4 foo.bar
5 foo.bar
6 foo.foo

I basically expect to get two dataframes out of them based on this list of lists:我基本上希望根据这个列表列表从中得到两个数据帧:

[[False, False, True], [False, False, True]]

OUTPUT should be: OUTPUT 应该是:

df1: df1:

A BB
1 foo.bar
2 foo.bar
3 foo.foo

df2 DF2

A BB
4 foo.bar
5 foo.bar
6 foo.foo

You can你可以

  • get the rows where df.BB equals 'foo.foo'获取df.BB等于'foo.foo'的行
  • shift that by one row将其移动一行
  • apply cumulative sum to that and对其应用累计和
  • group by the resulting indices.按结果指数分组。

You end up with a groupby object that you can turn into a list of sub-dfs.您最终得到一个groupby object,您可以将其转换为子 df 列表。

>>> groups = df.groupby(df.BB.eq('foo.foo').shift(fill_value=0).cumsum())
>>> frames = [frame for _, frame in groups]
>>> frames # list of sub-dfs
[   A       BB
 0  1  foo.bar
 1  2  foo.bar
 2  3  foo.foo,
    A       BB
 3  4  foo.bar
 4  5  foo.bar
 5  6  foo.foo]

Numpy: Numpy:

  • flatnonzero to find where the 'foo.foo' rows are flatnonzero查找'foo.foo'行的位置
  • split to divide the dataframe up accordingly split相应地划分 dataframe

import numpy as np

np.split(df, np.flatnonzero(df.BB.eq('foo.foo'))[:-1] + 1)

[   A       BB
 0  1  foo.bar
 1  2  foo.bar
 2  3  foo.foo,
    A       BB
 3  4  foo.bar
 4  5  foo.bar
 5  6  foo.foo]

Addressing @mozway's comment针对@mozway 的评论

list(filter(
    lambda d: not d.empty,
    np.split(df, np.flatnonzero(df.BB.eq('foo.foo')) + 1)
))

[   A       BB
 0  1  foo.bar
 1  2  foo.bar
 2  3  foo.foo,
    A       BB
 3  4  foo.bar
 4  5  foo.bar
 5  6  foo.foo]

Is it what you expect:是不是如你所愿:

m = len(df) // 2
df1, df2 = df.iloc[:m], df.iloc[m:]

Output: Output:

>>> df1
   A       BB
0  1  foo.bar
1  2  foo.bar
2  3  foo.foo

>>> df2
   A       BB
3  4  foo.bar
4  5  foo.bar
5  6  foo.foo

Or use np.split或者使用np.split

df1, df2 = np.split(df, 2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM