简体   繁体   中英

How to check if a column ends with either a or b in pandas

I need to filter columns by the last character, testing against multiple characters.

import numpy as np
import pandas as pd

df = pd.read_table("F:\\bridges.txt", names = ['IDENTIF','RIVER', 'LOCATION', 'ERECTED', 'PURPOSE', 'LENGTH', 'LANES', 
 'CLEAR-G', 'T-OR-D', 'MATERIAL', 'SPAN', 'REL-L', 'TYPE']) 

print(df.columns[df.columns.str.endswith('N' or 'H' or 's') ])

Output:

Index(['LOCATION', 'SPAN'], dtype='object')

Here I am not getting all columns ending with either N , H or s .

[col for col in df.columns if col[-1] in ['N', 'H', 'S']]

If I remember correctly, the columns attribute of a dataframe is not a series so you can't treat it as such. It's a list.

To clarify, the columns aren't technically lists. They are some variation of a special type of pandas Index. But for 99% of all intents and purposes they can be treated as lists. The point I'm trying to make clear is that they are not Series and thus don't have series methods.

You can use pd.Index.str.endswith with a tuple , followed by Boolean indexing:

L = ['IDENTIF','RIVER', 'LOCATION', 'ERECTED', 'PURPOSE', 'LENGTH',
     'LANES', 'CLEAR-G', 'T-OR-D', 'MATERIAL', 'SPAN', 'REL-L', 'TYPE']

df = pd.DataFrame(columns=L)

cols = df.columns[df.columns.str.endswith(tuple('HNS'))]

Index(['LOCATION', 'LENGTH', 'LANES', 'SPAN'], dtype='object')

The functionality mimics Python's built-in str.endswith , which allows you to supply a tuple to match against multiple items as alternative conditions.

df_serial = df_copy.filter(regex = '(?:H|N|S)$' , axis=1)
print(df_serial)

Using regular expression we can do that

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM