简体   繁体   中英

Pandas str.contains produces unexpected results

I am trying to search a column in a pandas dataframe (python 3.8.8) to find the rows that contain different strings. Here is an example of the df column I'm searching.

print(df['fileName'])
0         data/0001_X+0Y-1-0.txt
1         data/0001_X+0Y-1-0.txt
2         data/0001_X+0Y-1-0.txt
3         data/0001_X+0Y-1-0.txt
4         data/0001_X+0Y-1-0.txt
                            ...                   
171721    data/2293_X-1Y-1-0.txt
171722    data/2293_X-1Y-1-0.txt
171723    data/2293_X-1Y-1-0.txt
171724    data/2293_X-1Y-1-0.txt
171725    data/2293_X-1Y-1-0.txt

Does anyone know why I am only able to return results for 1 out of 9 different strings I want to search for? I am certain that there aren't typos in my search strings. I've copy/pasted into my script and interactive python shell to be sure.

Returns df with correct number of rows: contain_values = df[df['fileName'].str.contains("X-1Y-1-0")]

Returns empty df: contain_values2 = df[df['fileName'].str.contains("X+0Y-1-0")]

You have to disable regex on str.contains because + means one or more characters:

>>> df[df['fileName'].str.contains("X+0Y-1-0", regex=False)]

                 fileName
0  data/0001_X+0Y-1-0.txt
1  data/0001_X+0Y-1-0.txt
2  data/0001_X+0Y-1-0.txt
3  data/0001_X+0Y-1-0.txt
4  data/0001_X+0Y-1-0.txt

Or suggested by @YusufErtas, escape the sign + with \+ :

>>> df[df['fileName'].str.contains("X\\+0Y-1-0")]

                 fileName
0  data/0001_X+0Y-1-0.txt
1  data/0001_X+0Y-1-0.txt
2  data/0001_X+0Y-1-0.txt
3  data/0001_X+0Y-1-0.txt
4  data/0001_X+0Y-1-0.txt

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM