简体   繁体   English

通过多列查询数据框?

[英]Query a data frame by multiple columns?

I can't figure this error out. 我无法弄清楚这个错误。

df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1048575 entries, 1966-03-31 to 1994-03-31
Data columns (total 24 columns):
gvkey         1048575 non-null int64
tic           1048575 non-null object

df.query('(gvkey==1690) & (mkt_val> 400)')['2015-03-31':]

              gvkey   tic       conm   mkt_val>
datadate                                                                       
2015-03-31    1690  AAPL  APPLE INC     600
.
.
.

As you can see, there is a column 'tic', and a value 'AAPL' for it. 如您所见,有一个列“ tic”和一个值“ AAPL”。 So, why does the below query return as error? 那么,为什么以下查询返回错误? Isn't it almost the same as the above query? 它与上面的查询几乎不一样吗?

df.query('(tic=='AAPL') & (mkt_val> 400)')['2015-03-31':]

  File "<ipython-input-386-34ae806044b9>", line 1
    df.query('(tic=='AAPL') & (mkt_val> 400)')['2015-03-31':]
                        ^
SyntaxError: invalid syntax

In general, I have this large data set, with the dates column as the index. 总的来说,我有这么大的数据集,以dates列作为索引。 I always need to query different companies (tic) by different criteria (mkt_val> 400 in this case). 我总是需要通过不同的条件查询不同的公司(tic)(在这种情况下,mkt_val> 400)。 I always get confused when indexing based on multiple criteria. 基于多个条件建立索引时,我总是很困惑。 Shall I make the dataset multi-index (by date and tic). 我应该使数据集成为多索引(按日期和tic)。 Will that make my job easier? 这会使我的工作更轻松吗?

try this brother 试试这个兄弟

'(tic=="AAPL")

note how I use " not to confuse Python with ' 请注意我如何使用"不要将Python与'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM