是否有pandas Series（pandas.Series.query（））的查询方法或类似方法？

Question

The pandas.DataFrame.query() method is of great usage for (pre/post)-filtering data when loading or plotting. pandas.DataFrame.query()方法在加载或绘图时非常适用于（前/后）过滤数据。 It comes particularly handy for method chaining. 它对于方法链尤特别方便。

I find myself often wanting to apply the same logic to a pandas.Series , eg after having done a method such as df.value_counts which returns a pandas.Series . 我发现自己常常想同样的逻辑适用于pandas.Series ，例如，具有做了一个方法，如后df.value_counts返回一个pandas.Series 。

Example 例

Lets assume there is a huge table with the columns Player, Game, Points and I want to plot a histogram of the players with more than 14 times 3 points. 让我们假设有一个巨大的表格，其中列有Player, Game, Points ，我想绘制一个超过14次3分的玩家直方图。 I first have to sum the points of each player ( groupby -> agg ) which will return a Series of ~1000 players and their overall points. 我首先必须总结每个玩家的分数（ groupby -> agg ），这将返回一系列约1000名玩家及其总分。 Applying the .query logic it would look something like this: 应用.query逻辑，它看起来像这样：

df = pd.DataFrame({
    'Points': [random.choice([1,3]) for x in range(100)], 
    'Player': [random.choice(["A","B","C"]) for x in range(100)]})

(df
     .query("Points == 3")
     .Player.values_count()
     .query("> 14")
     .hist())

The only solutions I find force me to do an unnecessary assignment and break the method chaining: 我找到的唯一解决方案迫使我做一个不必要的任务并打破方法链：

(points_series = df
     .query("Points == 3")
     .groupby("Player").size()
points_series[points_series > 100].hist()

Method chaining as well as the query method help to keep the code legible meanwhile the subsetting-filtering can get messy quite quickly. 方法链接以及查询方法有助于保持代码清晰，同时子集化过滤可以很快变得混乱。

# just to make my point :)
series_bestplayers_under_100[series_prefiltered_under_100 > 0].shape

Please help me out of my dilemma! 请帮助我摆脱困境！ Thanks 谢谢

Answer 1

IIUC you can add query("Points > 100") : IIUC你可以添加query("Points > 100") ：

df = pd.DataFrame({'Points':[50,20,38,90,0, np.Inf],
                   'Player':['a','a','a','s','s','s']})

print (df)
  Player     Points
0      a  50.000000
1      a  20.000000
2      a  38.000000
3      s  90.000000
4      s   0.000000
5      s        inf

points_series = df.query("Points < inf").groupby("Player").agg({"Points": "sum"})['Points']
print (points_series)     
a = points_series[points_series > 100]
print (a)     
Player
a    108.0
Name: Points, dtype: float64


points_series = df.query("Points < inf")
                  .groupby("Player")
                  .agg({"Points": "sum"})
                  .query("Points > 100")

print (points_series)     
        Points
Player        
a        108.0

Another solution is Selection By Callable : 另一个解决方案是Select By Callable ：

points_series = df.query("Points < inf")
                  .groupby("Player")
                  .agg({"Points": "sum"})['Points']
                  .loc[lambda x: x > 100]

print (points_series)     
Player
a    108.0
Name: Points, dtype: float64

Edited answer by edited question: 编辑问题编辑的答案：

np.random.seed(1234)
df = pd.DataFrame({
    'Points': [np.random.choice([1,3]) for x in range(100)], 
    'Player': [np.random.choice(["A","B","C"]) for x in range(100)]})

print (df.query("Points == 3").Player.value_counts().loc[lambda x: x > 15])
C    19
B    16
Name: Player, dtype: int64

print (df.query("Points == 3").groupby("Player").size().loc[lambda x: x > 15])
Player
B    16
C    19
dtype: int64

Answer 2

Why not convert from Series to DataFrame, do the querying, and then convert back. 为什么不从Series转换为DataFrame，进行查询，然后转换回来。

df["Points"] = df["Points"].to_frame().query('Points > 100')["Points"]

Here, .to_frame() converts to DataFrame, while the trailing ["Points"] converts to Series. 这里， .to_frame()转换为DataFrame，而尾随["Points"]转换为Series。

The method .query() can then be used consistently whether or not the Pandas object has 1 or more columns. 无论Pandas对象是否包含1列或更多列，都可以一致地使用方法.query() 。

Answer 3

而不是查询，您可以使用pipe ：

s.pipe(lambda x: x[x>0]).pipe(lambda x: x[x<10])

是否有pandas Series（pandas.Series.query（））的查询方法或类似方法？

问题描述

Example 例

3 个解决方案

解决方案1
9 已采纳 2016-10-21 08:28:15

解决方案2
4 2016-11-15 12:44:21

解决方案3
2 2018-12-05 09:35:34

是否有pandas Series（pandas.Series.query（））的查询方法或类似方法？

问题描述

Example 例

3 个解决方案

解决方案1 9 已采纳 2016-10-21 08:28:15

解决方案2 4 2016-11-15 12:44:21

解决方案3 2 2018-12-05 09:35:34

解决方案1
9 已采纳 2016-10-21 08:28:15

解决方案2
4 2016-11-15 12:44:21

解决方案3
2 2018-12-05 09:35:34