简体   繁体   中英

get DataFrame rows matching dates

Let's say I have the following DataFrame:

df = pd.DataFrame({'item': ['Subway', 'Pasta', 'Chipotle'],
                   'cost': [10, 5, 9],
                   'date': ['2017-12-01', '2017-11-01', '2017-10-01']})
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')

I'm able to get all items in 2017-10 (only one item in this case):

print(df.set_index('date')['2017-10'])

According to the pandas documentation and this SO answer , I should be able to get all items from 2017-10 to 2017-11 (2 items in this case) with the following command but I'm getting an empty DataFrame:

print(df.set_index('date')['2017-10':'2017-11'])

Any idea what I'm doing wrong here (I'm using pandas version 0.21.0 )?

Moreover, is there an efficient way I can get all items in 2017-10 and 2017-12 (skipping 2017-11 )? I've come up with the following solution but I shouldn't have to create new columns like so:

df['month'] = df['date'].dt.month
df['year'] = df['date'].dt.year
print(df[((df.month==10) & (df.year==2017) | (df.month==12) & (df.year==2017))])

I reversed the order I was searching for the items so:

import pandas as pd 

df = pd.DataFrame({'item': ['Subway', 'Pasta', 'Chipotle'],
                   'cost': [10, 5, 9],
                   'date': ['2017-12-01', '2017-11-01', '2017-10-01']})
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')

print(df.set_index('date')['2017-11':'2017-10'])

For your 'date' it went from high to low. By switching them I got this output:

            cost      item
date                      
2017-11-01     5     Pasta
2017-10-01     9  Chipotle

First use set_index() with DatetimeIndex . Then you can use the indexing approach you wanted.

df.set_index(pd.DatetimeIndex(df.date), inplace=True)

df.sort_index().loc['2017-10':'2017-11']

            cost       date      item
date                                 
2017-10-01     9 2017-10-01  Chipotle
2017-11-01     5 2017-11-01     Pasta

With respect to your second question, you can also access the month property once you have a DatetimeIndex .

df.loc[df.index.month.isin([10,12])]

            cost       date      item
date                                 
2017-12-01    10 2017-12-01    Subway
2017-10-01     9 2017-10-01  Chipotle

(For the second part, to index by year as well, add & df.index.year == 2017 )

An alternative approach may be to use boolean indexing.

Here you provide statements that must be true in order for the rows to be returned.

For your second question, this would be:

df_October_and_December = df.ix[((df['date'] >= '2017-10-01') & (df['date'] <= '2017-10-31')) | ((df['date'] >= '2017-12-01') & (df['date'] <= '2017-12-31')) ,:]

The more elegant version of what you want is:

df_October_and_December = df.ix[(df['date'].dt.month.isin([10,12])) ,:]

I tend to use .ix referencing given it's flexibility and refine to .loc or .iloc if the application allows.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM