Pandas - Groupby并创建新的DataFrame？

Question

This is my situation - 这是我的情况 -

In[1]: data
Out[1]: 
     Item                    Type
0  Orange           Edible, Fruit
1  Banana           Edible, Fruit
2  Tomato       Edible, Vegetable
3  Laptop  Non Edible, Electronic

In[2]: type(data)
Out[2]: pandas.core.frame.DataFrame

What I want to do is create a data frame of only Fruits , so I need to groupby such a way that Fruit exists in Type . 我想要做的就是创建只有一个数据帧Fruits ，所以我需要groupby这样一种方式， Fruit中存在的Type 。

I've tried doing this: 我试过这样做：

grouped = data.groupby(lambda x: "Fruit" in x, axis=1)

I don't know if that's the way of doing it, I'm having a little tough time understanding groupby . 我不知道这是不是这样做，我有点难以理解groupby 。 How do I get a new DataFrame of only Fruits ? 如何获得只有Fruits的新DataFrame ？

Answer 1

You could use 你可以用

data[data['Type'].str.contains('Fruit')]

import pandas as pd

data = pd.DataFrame({'Item':['Orange', 'Banana', 'Tomato', 'Laptop'],
                     'Type':['Edible, Fruit', 'Edible, Fruit', 'Edible, Vegetable', 'Non Edible, Electronic']})
print(data[data['Type'].str.contains('Fruit')])

yields 产量

     Item           Type
0  Orange  Edible, Fruit
1  Banana  Edible, Fruit

Answer 2

groupby does something else entirely. groupby完全做了别的事。 It creates groups for aggregation. 它创建聚合组。 Basically, it goes from something like: 基本上，它来自：

['a', 'b', 'a', 'c', 'b', 'b']

to something like: 类似于：

[['a', 'a'], ['b', 'b', 'b'], ['c']]

What you want is df.apply . 你想要的是df.apply 。

In newer versions of pandas there's a query method that makes this a bit more efficient and easier. 在较新版本的pandas有一种query方法可以使它更有效，更容易。

However, one what of doing what you want is to make a boolean array by using 但是，做你想做的事就是使用一个布尔数组

mask = df.Type.apply(lambda x: 'Fruit' in x)

And then selecting the relevant portions of the data frame with df[mask] . 然后用df[mask]选择数据帧的相关部分。 Or, as a one-liner: 或者，作为一个单行：

df[df.Type.apply(lambda x: 'Fruit' in x)]

As a full example: 作为一个完整的例子：

import pandas as pd
data = [['Orange', 'Edible, Fruit'],
        ['Banana', 'Edible, Fruit'],
        ['Tomato', 'Edible, Vegtable'],
        ['Laptop', 'Non Edible, Electronic']]
df = pd.DataFrame(data, columns=['Item', 'Type'])

print df[df.Type.apply(lambda x: 'Fruit' in x)]

Pandas - Groupby并创建新的DataFrame？

问题描述

2 个解决方案

解决方案1
6 已采纳 2014-01-06 14:27:42

解决方案2
5 2014-01-06 14:27:44

Pandas - Groupby并创建新的DataFrame？

问题描述

2 个解决方案

解决方案1 6 已采纳 2014-01-06 14:27:42

解决方案2 5 2014-01-06 14:27:44

解决方案1
6 已采纳 2014-01-06 14:27:42

解决方案2
5 2014-01-06 14:27:44