简体   繁体   English

使用 str.contains 方法清理要在 pandas 查询中使用的输入

[英]Sanitizing input to be used in pandas query with the str.contains method

How can I sanitize the input in the code below so that it can be used to query a dataset df ?我如何清理下面代码中的输入,以便它可用于查询数据集df

query = f"`{field_name}`.str.contains('''{input()}''', case=False)"
df.query(query)

The main problem with the code above is that when the input contains triple quotes or backslashes it throws an error.上面代码的主要问题是,当输入包含三重引号或反斜杠时,它会抛出错误。 Also keep in mind that the dataframe also contains backslashes in some cells and thus I would like the query to be able to perform that search as well (eg if the input is a\s I would like the query to return rows that contain a\s like for example aaaa\saaaaa would be a match).还要记住 dataframe 在某些单元格中也包含反斜杠,因此我希望查询也能够执行该搜索(例如,如果输入是a\s我希望查询返回包含a\s例如aaaa\saaaaa a\s一个匹配项)。

Assume that field_name is given and not going to cause trouble.假设field_name已给出并且不会造成麻烦。

If I understand you correctly, you want this:如果我理解正确的话,你想要这个:

import pandas as pd
import numpy as np


s1 = pd.Series(['Mouse', 'dog  a\s', 'house and parrot', '23', np.NaN, 'aaaa\saaaaa', ' \  """   '])
s2 = s1.str.contains(input('input: '), regex=False)
print(s2)
input: a\s
0    False
1     True
2    False
3    False
4      NaN
5     True
6    False
dtype: object

Process finished with exit code 0

input: """
0    False
1    False
2    False
3    False
4      NaN
5    False
6     True
dtype: object

Process finished with exit code 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM