Pandas series.map 将值更改为 NaN

Question

I am working on an SMS dataset that has two columns a "label column" which is consists of "ham/spam" and another column with "messages" consist of a bunch of strings.我正在处理一个 SMS 数据集，它有两列“标签列”，它由“火腿/垃圾邮件”组成，另一列“消息”由一堆字符串组成。

I converted the "Label column" to numeric labels, ham=1, and spam=0我将“标签列”转换为数字标签，ham=1，spam=0

#Converting our labels to numeric labels
# ham = 0 and spam = 1
dfcat = dataset['label']=dataset.label.map({'ham':1,'spam':0})
dfcat.head()

so when I run the above code the first time it gave me the exact thing am looking for but after I ran it again it started giving me "Nan".所以当我第一次运行上面的代码时，它给了我我正在寻找的确切东西，但在我再次运行它之后它开始给我“Nan”。

Out[108]: 
0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
Name: label, dtype: float64

Please, I need a way to fix this.拜托，我需要一种方法来解决这个问题。

Answer 1

@G. @G。 Anderson gave the reason why are you seeing those NaN the second time you rerun it. Anderson 给出了第二次重新运行时为什么会看到这些 NaN 的原因。

As for a way to handle categorical variables in Python, one could use one hot encoding .至于在 Python 中处理分类变量的一种方法，可以使用一种热编码。 Toy example below:下面的玩具示例：

import pandas as pd

df = pd.DataFrame({"col1": ["a", "b", "c"], "label": ["ham", "spam", "ham"]})
df_ohe = pd.get_dummies(df, prefix="ohe", drop_first=True, columns=["label"])
df_ohe

However, it also depends on the amount of categorical variables and their cardinality (if high, one hot encoding might not be the best approach).但是，它还取决于分类变量的数量及其基数（如果很高，一种热编码可能不是最佳方法）。

Answer 2

The behavior of the series.map() function is to replace the values in the provided dictionary, and change other values to NaN . series.map() 函数的行为是替换提供的字典中的值，并将其他值更改为NaN 。 If you want to run the same line of code multiple times, then all values need to be accounted for.如果您想多次运行同一行代码，则需要考虑所有值。 You can either use a defaultdict, which allows a default value to be set, or just include the result of the first run as inputs in case you run it a second time.您可以使用 defaultdict，它允许设置默认值，或者只包含第一次运行的结果作为输入，以防您第二次运行它。 Change改变

dfcat = dataset['label']=dataset.label.map({'ham':1,'spam':0})

to到

dfcat = dataset['label']=dataset.label.map({'ham':1,'spam':0,1:1,0:0})

Pandas series.map 将值更改为 NaN

问题描述

2 个解决方案

解决方案1
0 2020-01-03 20:37:45

解决方案2
0 2020-01-03 21:26:30

Pandas series.map 将值更改为 NaN

问题描述

2 个解决方案

解决方案1 0 2020-01-03 20:37:45

解决方案2 0 2020-01-03 21:26:30

解决方案1
0 2020-01-03 20:37:45

解决方案2
0 2020-01-03 21:26:30