简体   繁体   English

str.contains 使用 na=Nan 标志

[英]str.contains using the na=Nan Flag

I still can't wrap my head around what the na flag does exactly in the df.str.contains(string na=True/False) .我仍然无法理解 na 标志在df.str.contains(string na=True/False)中究竟做了什么。

The documentation says: Fill value for missing values .文档说: Fill value for missing values

But what does it replace those missing values with?但是它用什么来替换这些缺失值呢?

Also what happens if you set it to True and and what happens if you set it to False.如果将其设置为 True 会发生什么,如果将其设置为 False 会发生什么。

Can someone please provide me with some examples of both scenarios.有人可以为我提供这两种情况的一些示例。

Your df.str.contains() returns :你的df.str.contains()返回:

A Series or Index of boolean values indicating whether the given pattern is contained within the string of each element of the Series or Index.布尔值的系列或索引,指示给定的模式是否包含在系列或索引的每个元素的字符串中。

So you will get a series of boolean values (True/False) for each element in your df series based on whether or not the substring is present in the element.因此,您将根据子字符串是否存在于元素中,为 df 系列中的每个元素获得一系列布尔值(真/假)。

Here is an example :这是一个例子:

sr = pd.Series(['Mouse', 'dog', 'house and parrot', '23', np.NaN])
sr.str.contains('og', na=False)


0    False
1     True
2    False
3    False
4    False
dtype: bool

So here I was checking if substring og is present in the elements of my series.所以在这里我检查子字符串og是否存在于我的系列元素中。 It returned a series of boolean values for each element in input series.它为输入系列中的每个元素返回一系列布尔值。

Also, notice I had a Nan value in my original series.另外,请注意我的原始系列中有一个Nan值。

Now what should happen incase the element is Nan ?现在应该发生什么 incase 元素是Nan What should we consider as the output of .str.contain() in this case ?在这种情况下,我们应该考虑什么作为.str.contain()的输出?

Ans.答。 - Here is where the flag na comes into play. - 这是标志na发挥作用的地方。 We can specify what to consider as the boolean outcome for elements having Nan value.对于具有Nan值的元素,我们可以指定要考虑的布尔结果。

In the above example I set the flag na=False which will return False incase element in the series is Nan .在上面的例子中,我设置了标志na=False ,它将返回 False incase 系列中的元素是Nan

Hope this helps:希望这可以帮助:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html

First str.contains function only works on pandas series.第一个 str.contains 函数仅适用于 pandas 系列。 Whatever value you give for na, it will fill that value in the output.无论您为 na 赋予什么值,它都会在输出中填充该值。

import numpy as np
import pandas as pd
df = pd.DataFrame({'v1':['dog','cat','cog',np.nan],'v2':['23','zip',np.nan,'4']})

df['v1'].str.contains('g',na=False)
0     True
1    False
2     True
3    False
Name: v1, dtype: bool

df['v1'].str.contains('g',na=True)
0     True
1    False
2     True
3     True
Name: v1, dtype: bool

df['v1'].str.contains('g',na=2)
0     True
1    False
2     True
3        2
Name: v1, dtype: object

As you can see, it fills the last na value with the given value.如您所见,它用给定的值填充最后一个 na 值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM