简体   繁体   中英

use regular expression or escape characters with pandas startswith function

I have dataframe containing column like this:

col_1                
W (2) L / W (1) L 
W (1) D / W (2) L
NaN
W (1) L
W (2) D / W (1) D
W (1) D 

I want to select rows with values starting with W (1) L or W (2) L or W (2) D so the result will be:

col_1                
W (2) L / W (1) L 
W (1) L
W (2) D / W (1) D

I tried this but didn't work:

df.loc[df.col_1.str.startswith('W \(1\) L')]

and this didn't worked:

df.loc[df.col_1.str.contains('^W\(L\).+', regex=True)]

Using str.contains:

df.loc[df['col1'].str.contains('^W \([12]\) L|^W \(2\) D', regex=True, na=False), :]

Once the regex is correct, the trick is to pass na=False so that the df.loc[] functions correctly - otherwise you'll get an error for the NaN value.

An example using str.match :

df['col_1'].replace(np.nan, 'none', inplace=True)
df[df['col_1'].str.match(r'^W\s\((1\)\sL|2\)\sL|2\)\sD)')]

In preparation, the first line cleans the data a bit and replaces any NaN values with a 'none' string. Then the regex takes care of the filtering, in line two.

Output:

               col_1
0  W (2) L / W (1) L
3            W (1) L
4  W (2) D / W (1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM