简体   繁体   中英

How to slice Pandas data frame by column header value when the column header is a date-time value?

I have an excel file where the column name consists of date-time value.

As you can see the header value is in date-time format. I have loaded this to Pandas dataframe and the header values are indeed saved as date-time value.

Now if I need to query from Pandas such as, "pick all columns which are greater than May-15" how can I do that?

I am aware that by querying df[df.columns[3:]] I can achieve this. But I really want to slice based on the value of column header and not based on the position of the column.

Please help.


Edit : Based on the answers below, I figured out a way to query the column values. Adding it here for future reference.

from datetime import datetime

df[[col for col in df.columns if col not in ("Name", "Location") 
           and col >= datetime(2015,4,1) 
           and col <= datetime(2016,3,1)]]

or

from datetime import datetime

df.loc[:, [col for col in df.columns if col not in ("Name", "Location") 
       and col >= datetime(2015,4,1) 
       and col <= datetime(2016,3,1)]]

The 1st solution is the most elegant. Conceptually, to column slicing in Pandas works when the intended columns are provided as a list. List comprehenion is used to slice the columns based on column label values. (not the values within the column). In the examples, I have filtered out "Name" and "Location" columns since I am comparing the remaining columns based on datatime value.

Querying works best to filter observations (rows), based on one or more variables (columns). The way your data is organized doesn't allow for a natural query (You're trying to filter columns as opposed to using them as criteria in the filter). You can read more about tidying dataframes here

Of course you can come up with a contorted way to do what you want, but I'd strongly suggest you tidy your data like this

name | location | date   | value
--------------------------------
John | London   | Apr-15 | 1000
John | London   | May-15 | 800
...

Then you can easily query based on the Date , and make sure that column is of a date type so you can use eg

df.query('20150501 < date')

Then when you're done and if you have to, you can always bring back the dataframe to its original format if required (If you can, better to avoid it and focus on organizing your data, it pays in the long run)

One easy-fix method would be to replace the Month string with its equivalent number.

dct = {'Jan': 1, 'Feb':2 ...}

new = []

for item in df.columns:
    a = item.split('-')

    try: 
       b= '%02d.%02d' %(a[1],a[0])
    except:                                  # if not a datetime i.e. 'name'
       b= str(a[0]) 

    new.append(b)

df.columns=new

This should make your dates in the form 15.04,15.05 .. 16.11 etc.

Alternatively: You can also Convert your headers into date-times and query them in that way:

 from datetime import datetime
 new=[]
 for item in df.columns:
      try:
          new.append( datetime.strptime( item , '%b-%y') )
      except:
          new.append( item )
 df.columns=new

 df.loc[:, df.columns <= datetime(2015, 5, 1)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM