简体   繁体   English

Pandas str.contains用于部分字符串的精确匹配

[英]Pandas str.contains for exact matches of partial strings

I have a DataFrame (I'll call it test ) with a column containing file paths and I want to filter the data using a partial path. 我有一个DataFrame(我称之为test ),其中包含一个包含文件路径的列,我想使用部分路径过滤数据。

                              full_path
0    C:\data\Data Files\BER\figure1.png
1    C:\data\Data Files\BER\figure2.png
2    C:\data\Previous\Error\summary.png
3        C:\data\Data Files\Val\1x2.png
4        C:\data\Data Files\Val\2x2.png
5         C:\data\Microscopy\defect.png

The partial path to find is: 找到的部分路径是:

ex = 'C:\\data\\Microscopy'

I've tried str.contains but, 我试过str.contains但是,

test.full_path.str.contains(ex)

0    False
1    False
2    False
3    False
4    False
5    False

I would have expected a value of True for index 5. At first I thought the problem might be with the path strings not actually matching due to differences with the escape character, but: 我希望索引5的值为True 。起初我认为问题可能是由于与转义字符的差异导致路径字符串实际上不匹配,但是:

ex in test.full_path.iloc[5]

equals True . 等于True After some digging, I'm thinking the argument to str.contains is supposed to be a regex expression so maybe the "\\"s in the partial path are messing things up? 经过一番挖掘后,我认为str.contains的参数应该是一个正则表达式,所以也许部分路径中的“\\”是搞乱了吗?

I also tried: 我也尝试过:

test.full_path.apply(lambda x: ex in x)

but this gives NameError: name 'ex' is not defined . 但这会产生NameError: name 'ex' is not defined These DataFrames can have a lot of rows in them so I'm also concerned that the apply function might not be very efficient. 这些DataFrame中可能包含很多行,所以我也担心apply函数效率可能不高。

Any suggestions on how to search a DataFrame column for exact partial string matches? 有关如何在DataFrame列中搜索精确的部分字符串匹配的任何建议吗?

Thanks! 谢谢!

You can pass regex=False to avoid confusion in the interpretation of the argument to str.contains : 您可以传递regex=False以避免在对str.contains的参数的解释中产生混淆:

>>> df.full_path.str.contains(ex)
0    False
1    False
2    False
3    False
4    False
5    False
Name: full_path, dtype: bool
>>> df.full_path.str.contains(ex, regex=False)
0    False
1    False
2    False
3    False
4    False
5     True
Name: full_path, dtype: bool

(Aside: your lambda x: ex in x should have worked. The NameError is a sign that you hadn't defined ex for some reason.) (旁白:你的lambda x: ex in x应该有效.NameError是一个你原来没有定义ex的标志。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM