简体   繁体   中英

How to remove rows in a pandas dataframe based on values in two columns?

I have excel data这里

which was converted to pandas dataframe:

         Year  Month   Day  Rain       Date
1.0      1988      1   1.0   0.0 1988-01-01
2.0      1988      1   2.0   0.0 1988-01-02
3.0      1988      1   3.0   0.0 1988-01-03
4.0      1988      1   4.0   0.0 1988-01-04
      ...    ...   ...   ...        ...
11156.0  2017     12  27.0   0.0 2018-06-08
11157.0  2017     12  28.0   0.0 2018-06-09
11158.0  2017     12  29.0   0.0 2018-06-10
11159.0  2017     12  30.0   0.0 2018-06-11
11160.0  2017     12  31.0   0.0 2018-06-12

The problem is that the month and day columns extend upto '12' and '31' respectively for each year which results in impossible combination of values such as '2' and '31' in the month and day columns, respectively against which the rain column also contains a value. How to remove the rows containing such impossible set of values?

PS: It's okay if the date column gets messed up due to this as I can generate the dates again in excel.

It's not clear what you're doing, or why you're generating your data in Excel but still using Pandas. Even so, Excel has built-in functionality for generating date ranges. Unless there's another reason you're using Excel, I see no point. Instead, if you're planning on working with the data in Pandas, you might as well use Pandas to generate the date range:

import pandas as pd

dates = pd.date_range("1988-01-01", "2017-12-31")
df = pd.DataFrame({"Date": dates, "Year": dates.year, "Month": dates.month, "Day": dates.day})

Output:

In [3]: df
Out[3]:
            Date  Year  Month  Day
0     1988-01-01  1988      1    1
1     1988-01-02  1988      1    2
2     1988-01-03  1988      1    3
3     1988-01-04  1988      1    4
4     1988-01-05  1988      1    5
...          ...   ...    ...  ...
10953 2017-12-27  2017     12   27
10954 2017-12-28  2017     12   28
10955 2017-12-29  2017     12   29
10956 2017-12-30  2017     12   30
10957 2017-12-31  2017     12   31

[10958 rows x 4 columns]

In [4]: df[(df.Month==2) & (df.Day==31)]
Out[4]:
Empty DataFrame
Columns: [Date, Year, Month, Day]
Index: []

In [5]: df[(df.Month==2) & (df.Day==29)]
Out[5]:
            Date  Year  Month  Day
59    1988-02-29  1988      2   29
1520  1992-02-29  1992      2   29
2981  1996-02-29  1996      2   29
4442  2000-02-29  2000      2   29
5903  2004-02-29  2004      2   29
7364  2008-02-29  2008      2   29
8825  2012-02-29  2012      2   29
10286 2016-02-29  2016      2   29

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM