简体   繁体   中英

Identifying multiple columns by name in Pandas

Is there a way to select a subset of columns using text matching or regular expressions?

In R it would be like this:

attach(iris) #Load the 'Stairway to Heaven' of R's built-in data sets
iris[grep(names(iris),pattern="Length")] #Prints only columns containing the word "Length"

You can use the filter method for this (use axis=1 to filter on the column names). This function has different possibilities:

  • Equivalent to if 'Length' in col :

     df.filter(like='Length', axis=1) 
  • Using a regex (however, it is using re.search and not re.match , so you have possibly to adjust the regex):

     df.filter(regex=r'\\.Length$', axis=1) 

Using Python's in statement, it would work like this:

#Assuming iris is already loaded as a df called 'iris' and has a proper header
iris = iris[[col for col in iris.columns if 'Length' in col]]
print iris.head()

Or, using regular expressions,

import re
iris = iris[[col for col in iris.columns if re.match(r'\.Length$',col)]]
print iris.head()

The first will run faster but the second will be more accurate.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM