简体   繁体   English

Pandas Dataframe:按值选择多列不起作用,为什么?

[英]Pandas Dataframe: selecting multiple column by value does not work, why?

I build a Dataframe from an sql query (data for may 2014 to june 2015) and try to build 2 different data sets - train uses all data except June 2015 - test uses only data for June 2015 When I try to use : train= df[(df.month!=6) & (df.year!=2015)]我从 sql 查询(2014 年 5 月至 2015 年 6 月的数据)构建了一个数据框,并尝试构建 2 个不同的数据集 - 火车使用除 2015 年 6 月以外的所有数据 - 测试仅使用 2015 年 6 月的数据当我尝试使用时:train= df [(df.month!=6) & (df.year!=2015)]

It seems that I am using OR instead of AND, because I do not get any values for month=6 (also not for 2014) and no values for year=2015 at all, so also not for other months in 2015.似乎我使用的是 OR 而不是 AND,因为我没有得到任何月=6 的值(也不是 2014 年),而且根本没有年=2015 的值,所以 2015 年的其他月份也没有。

I do not understand what is wrong with the code.我不明白代码有什么问题。

mycurs.execute("""SELECT day, month, year, cloudcover, moonphase, precipintensity,
precipaccumulation, preciptype2, humidity, pressure, windspeed,
uvindex, visibility, temperaturehigh, weekend_2, crimecount
FROM open.nyc_weatherhist_2""")
f=mycurs.fetchall() #returns tuple with first row (unordered list)

df=pd.DataFrame(f, columns=feature_names)
print(df)
     day  month  year  ...  temperaturehigh  weekend  crimecount
0     28      4  2015  ...            20.85        0          56
1     14      4  2015  ...            18.25        0         103
2     13      7  2014  ...            27.44        0          89
3      4      1  2015  ...            12.94        0          99
4     21      9  2014  ...            24.15        0          66
..   ...    ...   ...  ...              ...      ...         ...
390    4      7  2014  ...            23.37        1          84
391    8      8  2014  ...            27.98        1          97
392   26      4  2015  ...            15.78        0          82
393    3      8  2014  ...            24.50        0          80
394    5      6  2015  ...            20.65        1          87

[395 rows x 16 columns]

train= df[(df.month!=6) & (df.year!=2015)]
print(train)
     day  month  year  ...  temperaturehigh  weekend  crimecount
2     13      7  2014  ...            27.44        0          89
4     21      9  2014  ...            24.15        0          66
8     10     11  2014  ...            16.27        0          76
9      5     11  2014  ...            17.76        0         101
11    10      7  2014  ...            28.06        0          99
..   ...    ...   ...  ...              ...      ...         ...
382   10      8  2014  ...            30.51        0         119
389   21     11  2014  ...             2.65        1         110
390    4      7  2014  ...            23.37        1          84
391    8      8  2014  ...            27.98        1          97
393    3      8  2014  ...            24.50        0          80

[184 rows x 16 columns]

Just elaborating on Francis answer, you're looking at the condition in the wrong manner.只是详细说明弗朗西斯的回答,您以错误的方式看待病情。

You want all the data except for a certain month in a certain year.您需要除某年某月之外的所有数据。 So, when would the value be "right"?那么,该值何时是“正确的”? If either of them is not in the month or year.如果它们中的任何一个不在月份或年份中。

So, this is exactly the OR condition.所以,这正是OR条件。

Another way to look at it - you have conditions X and Y .另一种看待它的方式 - 你有条件XY You're "wrong" when both X and Y exist, or simply X & Y .XY存在时,你就“错了”,或者只是X & Y
So, when are you "right"?那么,你什么时候“正确”? When !(X & Y) == !X | !Y!(X & Y) == !X | !Y !X | !Y . !X | !Y

It's your choice how you tackle this, but you can do something like:您可以选择如何解决这个问题,但您可以执行以下操作:

train = df[(df.month != 6) | (df.year != 2015)]
or或者
train = df[~((df.month == 6) & (df.year == 2015))]
Which are equivalent哪些是等价的

You need to use |您需要使用 | for OR as suspected.为或怀疑。

    >>> df = pd.DataFrame({'year':[2014,2015,2015],'month':[6,5,6]})
    >>> df
       year  month
    0  2014      6
    1  2015      5
    2  2015      6
    >>> train = df[(df['year']!=2015) | (df['month']!=6)]
    >>> train
       year  month
    0  2014      6
    1  2015      5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM