简体   繁体   中英

Pandas str.contains for exact matches of partial strings

I have a DataFrame (I'll call it test ) with a column containing file paths and I want to filter the data using a partial path.

                              full_path
0    C:\data\Data Files\BER\figure1.png
1    C:\data\Data Files\BER\figure2.png
2    C:\data\Previous\Error\summary.png
3        C:\data\Data Files\Val\1x2.png
4        C:\data\Data Files\Val\2x2.png
5         C:\data\Microscopy\defect.png

The partial path to find is:

ex = 'C:\\data\\Microscopy'

I've tried str.contains but,

test.full_path.str.contains(ex)

0    False
1    False
2    False
3    False
4    False
5    False

I would have expected a value of True for index 5. At first I thought the problem might be with the path strings not actually matching due to differences with the escape character, but:

ex in test.full_path.iloc[5]

equals True . After some digging, I'm thinking the argument to str.contains is supposed to be a regex expression so maybe the "\\"s in the partial path are messing things up?

I also tried:

test.full_path.apply(lambda x: ex in x)

but this gives NameError: name 'ex' is not defined . These DataFrames can have a lot of rows in them so I'm also concerned that the apply function might not be very efficient.

Any suggestions on how to search a DataFrame column for exact partial string matches?

Thanks!

You can pass regex=False to avoid confusion in the interpretation of the argument to str.contains :

>>> df.full_path.str.contains(ex)
0    False
1    False
2    False
3    False
4    False
5    False
Name: full_path, dtype: bool
>>> df.full_path.str.contains(ex, regex=False)
0    False
1    False
2    False
3    False
4    False
5     True
Name: full_path, dtype: bool

(Aside: your lambda x: ex in x should have worked. The NameError is a sign that you hadn't defined ex for some reason.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM