[英]Pandas How to filter a Series
I have a Series like this after doing groupby('name') and used mean() function on other column在执行 groupby('name') 并在其他列上使用 mean() 函数后,我有一个这样的系列
name
383 3.000000
663 1.000000
726 1.000000
737 9.000000
833 8.166667
Could anyone please show me how to filter out the rows with 1.000000 mean values?谁能告诉我如何过滤掉平均值为 1.000000 的行? Thank you and I greatly appreciate your help.
谢谢你,我非常感谢你的帮助。
In [5]:
import pandas as pd
test = {
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
}
s = pd.Series(test)
s = s[s != 1]
s
Out[0]:
383 3.000000
737 9.000000
833 8.166667
dtype: float64
From pandas version 0.18+ filtering a series can also be done as below从熊猫版本 0.18+ 过滤一系列也可以完成如下
test = {
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
}
pd.Series(test).where(lambda x : x!=1).dropna()
Checkout: http://pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#method-chaininng-improvements结帐: http : //pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#method-chaininng-improvements
As DACW pointed out , there are method-chaining improvements in pandas 0.18.1 that do what you are looking for very nicely.正如DACW 指出的那样,pandas 0.18.1 中的方法链改进可以很好地满足您的需求。
Rather than using .where
, you can pass your function to either the .loc
indexer or the Series indexer []
and avoid the call to .dropna
:而不是使用
.where
,你可以通过你的功能,无论是.loc
索引或索引系列[]
避免调用.dropna
:
test = pd.Series({
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
})
test.loc[lambda x : x!=1]
test[lambda x: x!=1]
Similar behavior is supported on the DataFrame and NDFrame classes. DataFrame 和 NDFrame 类支持类似的行为。
Another way is to first convert to a DataFrame and use the query method (assuming you have numexpr installed):另一种方法是首先转换为DataFrame并使用查询方法(假设您安装了numexpr):
import pandas as pd
test = {
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
}
s = pd.Series(test)
s.to_frame(name='x').query("x != 1")
If you like a chained operation, you can also use compress
function:如果你喜欢链式操作,你也可以使用
compress
函数:
test = pd.Series({
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
})
test.compress(lambda x: x != 1)
# 383 3.000000
# 737 9.000000
# 833 8.166667
# dtype: float64
In my case I had a panda Series where the values are tuples of characters :就我而言,我有一个熊猫系列,其中值是字符元组:
Out[67]
0 (H, H, H, H)
1 (H, H, H, T)
2 (H, H, T, H)
3 (H, H, T, T)
4 (H, T, H, H)
Therefore I could use indexing to filter the series, but to create the index I needed apply
.因此,我可以使用索引来过滤系列,但要创建我需要的索引
apply
。 My condition is "find all tuples which have exactly one 'H'".我的条件是“找到所有正好有一个‘H’的元组”。
series_of_tuples[series_of_tuples.apply(lambda x: x.count('H')==1)]
I admit it is not "chainable" , (ie notice I repeat series_of_tuples
twice; you must store any temporary series into a variable so you can call apply(...) on it).我承认它不是“可链接的” ,(即注意我重复了
series_of_tuples
两次;您必须将任何临时系列存储到一个变量中,以便您可以对其调用 apply(...) )。
There may also be other methods (besides .apply(...)
) which can operate elementwise to produce a Boolean index.可能还有其他方法(除了
.apply(...)
)可以按元素操作以生成布尔索引。
Many other answers (including accepted answer) using the chainable functions like:使用可链接函数的许多其他答案(包括已接受的答案),例如:
.compress()
.where()
.loc[]
[]
These accept callables (lambdas) which are applied to the Series , not to the individual values in those series!这些接受应用于 Series 的可调用对象(lambdas) ,而不是这些系列中的单个值!
Therefore my Series of tuples behaved strangely when I tried to use my above condition / callable / lambda, with any of the chainable functions, like .loc[]
:因此,当我尝试将上述条件/可调用/lambda 与任何可链接函数(如
.loc[]
一起使用时,我的元组系列表现得很奇怪:
series_of_tuples.loc[lambda x: x.count('H')==1]
Produces the error:产生错误:
KeyError: 'Level H must be same as name (None)' KeyError:'级别 H 必须与名称相同(无)'
I was very confused, but it seems to be using the Series.count series_of_tuples.count(...)
function , which is not what I wanted.我很困惑,但它似乎正在使用Series.count
series_of_tuples.count(...)
函数,这不是我想要的。
I admit that an alternative data structure may be better:我承认另一种数据结构可能更好:
This creates a series of strings (ie by concatenating the tuple; joining the characters in the tuple on a single string)这将创建一系列字符串(即通过连接元组;将元组中的字符连接到单个字符串上)
series_of_tuples.apply(''.join)
So I can then use the chainable Series.str.count
所以我可以使用可链接的
Series.str.count
series_of_tuples.apply(''.join).str.count('H')==1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.