[英]Pandas - Groupby and create new DataFrame?
This is my situation - 这是我的情况 -
In[1]: data
Out[1]:
Item Type
0 Orange Edible, Fruit
1 Banana Edible, Fruit
2 Tomato Edible, Vegetable
3 Laptop Non Edible, Electronic
In[2]: type(data)
Out[2]: pandas.core.frame.DataFrame
What I want to do is create a data frame of only Fruits
, so I need to groupby
such a way that Fruit
exists in Type
. 我想要做的就是创建只有一个数据帧
Fruits
,所以我需要groupby
这样一种方式, Fruit
中存在的Type
。
I've tried doing this: 我试过这样做:
grouped = data.groupby(lambda x: "Fruit" in x, axis=1)
I don't know if that's the way of doing it, I'm having a little tough time understanding groupby
. 我不知道这是不是这样做,我有点难以理解
groupby
。 How do I get a new DataFrame
of only Fruits
? 如何获得只有
Fruits
的新DataFrame
?
You could use 你可以用
data[data['Type'].str.contains('Fruit')]
import pandas as pd
data = pd.DataFrame({'Item':['Orange', 'Banana', 'Tomato', 'Laptop'],
'Type':['Edible, Fruit', 'Edible, Fruit', 'Edible, Vegetable', 'Non Edible, Electronic']})
print(data[data['Type'].str.contains('Fruit')])
yields 产量
Item Type
0 Orange Edible, Fruit
1 Banana Edible, Fruit
groupby
does something else entirely. groupby
完全做了别的事。 It creates groups for aggregation. 它创建聚合组。 Basically, it goes from something like:
基本上,它来自:
['a', 'b', 'a', 'c', 'b', 'b']
to something like: 类似于:
[['a', 'a'], ['b', 'b', 'b'], ['c']]
What you want is df.apply
. 你想要的是
df.apply
。
In newer versions of pandas
there's a query
method that makes this a bit more efficient and easier. 在较新版本的
pandas
有一种query
方法可以使它更有效,更容易。
However, one what of doing what you want is to make a boolean array by using 但是,做你想做的事就是使用一个布尔数组
mask = df.Type.apply(lambda x: 'Fruit' in x)
And then selecting the relevant portions of the data frame with df[mask]
. 然后用
df[mask]
选择数据帧的相关部分。 Or, as a one-liner: 或者,作为一个单行:
df[df.Type.apply(lambda x: 'Fruit' in x)]
As a full example: 作为一个完整的例子:
import pandas as pd
data = [['Orange', 'Edible, Fruit'],
['Banana', 'Edible, Fruit'],
['Tomato', 'Edible, Vegtable'],
['Laptop', 'Non Edible, Electronic']]
df = pd.DataFrame(data, columns=['Item', 'Type'])
print df[df.Type.apply(lambda x: 'Fruit' in x)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.