I need to filter columns by the last character, testing against multiple characters.
import numpy as np
import pandas as pd
df = pd.read_table("F:\\bridges.txt", names = ['IDENTIF','RIVER', 'LOCATION', 'ERECTED', 'PURPOSE', 'LENGTH', 'LANES',
'CLEAR-G', 'T-OR-D', 'MATERIAL', 'SPAN', 'REL-L', 'TYPE'])
print(df.columns[df.columns.str.endswith('N' or 'H' or 's') ])
Output:
Index(['LOCATION', 'SPAN'], dtype='object')
Here I am not getting all columns ending with either N
, H
or s
.
[col for col in df.columns if col[-1] in ['N', 'H', 'S']]
If I remember correctly, the columns
attribute of a dataframe is not a series so you can't treat it as such. It's a list.
To clarify, the columns aren't technically lists. They are some variation of a special type of pandas Index. But for 99% of all intents and purposes they can be treated as lists. The point I'm trying to make clear is that they are not
Series and thus don't have series methods.
You can use pd.Index.str.endswith
with a tuple
, followed by Boolean indexing:
L = ['IDENTIF','RIVER', 'LOCATION', 'ERECTED', 'PURPOSE', 'LENGTH',
'LANES', 'CLEAR-G', 'T-OR-D', 'MATERIAL', 'SPAN', 'REL-L', 'TYPE']
df = pd.DataFrame(columns=L)
cols = df.columns[df.columns.str.endswith(tuple('HNS'))]
Index(['LOCATION', 'LENGTH', 'LANES', 'SPAN'], dtype='object')
The functionality mimics Python's built-in str.endswith
, which allows you to supply a tuple
to match against multiple items as alternative conditions.
df_serial = df_copy.filter(regex = '(?:H|N|S)$' , axis=1)
print(df_serial)
Using regular expression we can do that
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.