[英]Pandas dataframe.query method syntax
Question: 题:
I would like to gain a better understanding of the Pandas DataFrame.query method and what the following expression represents: 我想更好地理解Pandas DataFrame.query方法以及以下表达式代表的内容:
match = dfDays.query('index > @x.name & price >= @x.target')
What does @x.name
represent? @x.name
代表什么?
I understand what the resulting output is for this code (a new column with pandas.tslib.Timestamp
data) but don't have a clear understanding of the expression used to get this end result. 我理解这个代码的结果是什么(带有
pandas.tslib.Timestamp
数据的新列),但是没有清楚地了解用于获得此最终结果的表达式。
Data: 数据:
From here: 从这里:
Vectorised way to query date and price data 矢量化的方式来查询日期和价格数据
np.random.seed(seed=1)
rng = pd.date_range('1/1/2000', '2000-07-31',freq='D')
weeks = np.random.uniform(low=1.03, high=3, size=(len(rng),))
ts2 = pd.Series(weeks
,index=rng)
dfDays = pd.DataFrame({'price':ts2})
dfWeeks = dfDays.resample('1W-Mon').first()
dfWeeks['target'] = (dfWeeks['price'] + .5).round(2)
def find_match(x):
match = dfDays.query('index > @x.name & price >= @x.target')
if not match.empty:
return match.index[0]
dfWeeks.assign(target_hit=dfWeeks.apply(find_match, 1))
@x.name
- @
helps .query()
to understand that x
is an external object (doesn't belong to the DataFrame for which the query() method was called). @x.name
- @
帮助.query()
理解x
是一个外部对象(不属于调用query()方法的DataFrame)。 In this case x
is a DataFrame. 在这种情况下,
x
是一个DataFrame。 It could be a scalar value as well. 它也可以是标量值。
I hope this small demonstration will help you to understand it: 我希望这个小型演示能帮助你理解它:
In [79]: d1
Out[79]:
a b c
0 1 2 3
1 4 5 6
2 7 8 9
In [80]: d2
Out[80]:
a x
0 1 10
1 7 11
In [81]: d1.query("a in @d2.a")
Out[81]:
a b c
0 1 2 3
2 7 8 9
In [82]: d1.query("c < @d2.a")
Out[82]:
a b c
1 4 5 6
Scalar x
: 标量
x
:
In [83]: x = 9
In [84]: d1.query("c == @x")
Out[84]:
a b c
2 7 8 9
Everything @MaxU said is perfect! @MaxU说的一切都很完美!
I wanted to add some context to the specific problem that this was applied to. 我想为这个应用的具体问题添加一些上下文。
find_match
This is a helper function that is used in the dataframe dfWeeks.apply
. 这是一个在数据
dfWeeks.apply
使用的辅助函数。 Two things to note: 有两点需要注意:
find_match
takes a single argument x
. find_match
采用单个参数x
。 This will be a single row of dfWeeks
. dfWeeks
一行。
pd.Series
object and each row will be passed through this function. pd.Series
对象,每一行都将通过此函数传递。 This is the nature of using apply
. apply
的本质。 apply
passes this row to the helper function, the row has a name
attribute that is equal to the index value for that row in the dataframe. apply
将此行传递给辅助函数时,该行的name
属性等于数据框中该行的索引值。 In this case, I know that the index value is a pd.Timestamp
and I'll use it to do the comparing I need to do. pd.Timestamp
,我将用它来做我需要做的比较。 find_match
references dfDays
which is outside the scope of find_match
itself. find_match
引用dfDays
其范围之外find_match
本身。 I didn't have to use query
... I like using query
. 我没有使用
query
...我喜欢使用query
。 It is my opinion that it makes some code prettier. 我认为它使一些代码更漂亮。 The following function, as provided by the OP, could've been written differently
OP提供的以下功能可能采用不同的方式编写
def find_match(x):
"""Original"""
match = dfDays.query('index > @x.name & price >= @x.target')
if not match.empty:
return match.index[0]
dfWeeks.assign(target_hit=dfWeeks.apply(find_match, 1))
find_match_alt
Or we could've done this, which may help to explain what the query
string is doing above 或者我们可以做到这一点,这可能有助于解释
query
字符串在上面做了什么
def find_match_alt(x):
"""Alternative to OP's"""
date_is_afterwards = dfDays.index > x.name
price_target_is_met = dfDays.price >= x.target
both_are_true = price_target_is_met & date_is_afterwards
if (both_are_true).any():
return dfDays[both_are_true].index[0]
dfWeeks.assign(target_hit=dfWeeks.apply(find_match_alt, 1))
Comparing these two functions should give good perspective. 比较这两个功能应该提供良好的视角。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.