[英]How to replace an entire cell with NaN on pandas DataFrame
I want to replace the entire cell that contains the word as circled in the picture with blanks or NaN. 我想用空格或NaN替换包含图中带圆圈的单词的整个单元格。 However when I try to replace for example '1.25 Dividend' it turned out as '1.25 NaN'.
然而,当我尝试更换例如'1.25 Dividend'时,结果却是'1.25 NaN'。 I want to return the whole cell as 'NaN'.
我想把整个细胞归还为'NaN'。 Any idea how to work on this?
知道如何处理这个吗?
Option 1 选项1
Use a regular expression in your replace 在替换中使用正则表达式
df.replace('^.*Dividend.*$', np.nan, regex=True)
From comments 来自评论
(Using regex=True
) means that it will interpret the problem as a regular expression one. (使用
regex=True
)意味着它会将问题解释为正则表达式。 You still need an appropriate pattern. 你仍然需要一个合适的模式。 The
'^'
says to start at the beginning of the string. '^'
表示从字符串的开头开始。 '^.*'
matches all characters from the beginning of the string. '^.*'
匹配字符串开头的所有字符。 '$'
says to end the match with the end of the string. '$'
表示以字符串结尾结束匹配。 '.*$'
matches all characters up to the end of the string. '.*$'
匹配字符串末尾的所有字符。 Finally, '^.*Dividend.*$'
matches all characters from the beginning, has 'Dividend'
somewhere in the middle, then any characters after it. 最后,
'^.*Dividend.*$'
从头开始匹配所有字符,在中间某处有'Dividend'
,然后在它后面有任何字符。 Then replace this whole thing with np.nan
然后用
np.nan
替换整个东西
Consider the dataframe df
考虑数据帧
df
df = pd.DataFrame([[1, '2 Dividend'], [3, 4], [5, '6 Dividend']])
df
0 1
0 1 2 Dividend
1 3 4
2 5 6 Dividend
then the proposed solution yields 然后提出的解决方案产生
0 1
0 1 NaN
1 3 4.0
2 5 NaN
Option 2 选项2
Another alternative is to use pd.DataFrame.mask
in conjunction with a applymap
. 另一种方法是将
pd.DataFrame.mask
与applymap
结合使用。
If I pass a lambda
to applymap
that identifies if any cell has 'Dividend'
in it. 如果我将
lambda
传递给applymap
,以确定是否有任何单元格中有'Dividend'
。
df.mask(df.applymap(lambda s: 'Dividend' in s if isinstance(s, str) else False))
0 1
0 1 NaN
1 3 4
2 5 NaN
Option 3 选项3
Similar in concept but using stack
/ unstack
+ pd.Series.str.contains
在概念上类似,但使用
stack
/ unstack
+ pd.Series.str.contains
df.mask(df.stack().astype(str).str.contains('Dividend').unstack())
0 1
0 1 NaN
1 3 4
2 5 NaN
替换所有字符串:
df.apply(lambda x: pd.to_numeric(x, errors='coerce'))
我会像这样使用applymap
df.applymap(lambda x: 'NaN' if (type(x) is str and 'Dividend' in x) else x)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.