I have a DataFrame (I'll call it test
) with a column containing file paths and I want to filter the data using a partial path.
full_path
0 C:\data\Data Files\BER\figure1.png
1 C:\data\Data Files\BER\figure2.png
2 C:\data\Previous\Error\summary.png
3 C:\data\Data Files\Val\1x2.png
4 C:\data\Data Files\Val\2x2.png
5 C:\data\Microscopy\defect.png
The partial path to find is:
ex = 'C:\\data\\Microscopy'
I've tried str.contains
but,
test.full_path.str.contains(ex)
0 False
1 False
2 False
3 False
4 False
5 False
I would have expected a value of True
for index 5. At first I thought the problem might be with the path strings not actually matching due to differences with the escape character, but:
ex in test.full_path.iloc[5]
equals True
. After some digging, I'm thinking the argument to str.contains
is supposed to be a regex expression so maybe the "\\"s in the partial path are messing things up?
I also tried:
test.full_path.apply(lambda x: ex in x)
but this gives NameError: name 'ex' is not defined
. These DataFrames can have a lot of rows in them so I'm also concerned that the apply
function might not be very efficient.
Any suggestions on how to search a DataFrame column for exact partial string matches?
Thanks!
You can pass regex=False
to avoid confusion in the interpretation of the argument to str.contains
:
>>> df.full_path.str.contains(ex)
0 False
1 False
2 False
3 False
4 False
5 False
Name: full_path, dtype: bool
>>> df.full_path.str.contains(ex, regex=False)
0 False
1 False
2 False
3 False
4 False
5 True
Name: full_path, dtype: bool
(Aside: your lambda x: ex in x
should have worked. The NameError is a sign that you hadn't defined ex
for some reason.)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.