简体   繁体   English

(熊猫):有什么区别ISIN()和包含()

[英](Pandas) : What is the difference ISIN() and contains ()

I want to know if a specific string is present in some columns of my dataframe (a different string for each column). 我想知道我的数据帧的某些列中是否存在特定的字符串(每列的不同字符串)。 From what I understand isin is written for dataframes but can work for Series as well while str.contains works better for Series. 据我所知, isin是为数据帧编写的,但也适用于Series,而str.contains对于Series更好。 Actually I don't understand how I should choose between the two. 其实我不明白我应该如何在两者之间做出选择。

Thanks a lot in advance for the answer, I have searched for similar questions but didn't find any explanation on what to choose between the two. 非常感谢您的回答,我已经搜索了类似的问题,但没有找到任何解释两者之间的选择。

.isin checks if each value in the column is contained in a list of arbitrary values. .isin检查列中的每个值是否包含在任意值列表中。 Roughly equivalent to value in [value1, value2] . 大致相当于value in [value1, value2]

.contains checks if arbitrary values are contained in each value in the column. .contains检查列中每个值中是否包含任意值。 Roughly equivalent to substring in large_string . 大致相当于substring in large_string

In other words, .isin works column-wise and is available for all data types. 换句话说, .isin列工作,可用于所有数据类型。 .contains works element-wise and makes sense only when dealing with strings (or values that can be represented as strings). .contains在元素方面有效,只有在处理字符串(或可以表示为字符串的值)时才有意义。

From the official documentation: 从官方文档:

Series.isin(values) Series.isin(值)

Check whether values are contained in Series. 检查系列中是否包含值。 Return a boolean Series showing whether each element in the Series matches an element in the passed sequence of values exactly. 返回一个布尔系列,显示Series中的每个元素是否与传递的值序列中的元素完全匹配。


Series.str.contains(pat, case=True, flags=0, na=nan,** **regex=True) Series.str.contains(pat,case = True,flags = 0,na = nan,** ** regex = True)

Test if pattern or regex is contained within a string of a Series or Index. 测试pattern或regex是否包含在Series或Index的字符串中。

Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index. 返回布尔值系列或索引,具体取决于给定模式或正则表达式是否包含在系列或索引的字符串中。

Examples: 例子:

print(df)
#     a
# 0  aa
# 1  ba
# 2  ca

print(df[df['a'].isin(['aa', 'ca'])])
#     a
# 0  aa
# 2  ca

print(df[df['a'].str.contains('b')])
#     a
# 1  ba

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM