简体   繁体   English

Pandas series.map 将值更改为 NaN

[英]Pandas series.map changes values to NaN

I am working on an SMS dataset that has two columns a "label column" which is consists of "ham/spam" and another column with "messages" consist of a bunch of strings.我正在处理一个 SMS 数据集,它有两列“标签列”,它由“火腿/垃圾邮件”组成,另一列“消息”由一堆字符串组成。

I converted the "Label column" to numeric labels, ham=1, and spam=0我将“标签列”转换为数字标签,ham=1,spam=0

#Converting our labels to numeric labels
# ham = 0 and spam = 1
dfcat = dataset['label']=dataset.label.map({'ham':1,'spam':0})
dfcat.head()

so when I run the above code the first time it gave me the exact thing am looking for but after I ran it again it started giving me "Nan".所以当我第一次运行上面的代码时,它给了我我正在寻找的确切东西,但在我再次运行它之后它开始给我“Nan”。

Out[108]: 
0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
Name: label, dtype: float64

Please, I need a way to fix this.拜托,我需要一种方法来解决这个问题。

@G. @G。 Anderson gave the reason why are you seeing those NaN the second time you rerun it. Anderson 给出了第二次重新运行时为什么会看到这些 NaN 的原因。

As for a way to handle categorical variables in Python, one could use one hot encoding .至于在 Python 中处理分类变量的一种方法,可以使用一种热编码 Toy example below:下面的玩具示例:

import pandas as pd

df = pd.DataFrame({"col1": ["a", "b", "c"], "label": ["ham", "spam", "ham"]})
df_ohe = pd.get_dummies(df, prefix="ohe", drop_first=True, columns=["label"])
df_ohe

However, it also depends on the amount of categorical variables and their cardinality (if high, one hot encoding might not be the best approach).但是,它还取决于分类变量的数量及其基数(如果很高,一种热编码可能不是最佳方法)。

The behavior of the series.map() function is to replace the values in the provided dictionary, and change other values to NaN . series.map() 函数的行为是替换提供的字典中的值,并将其他值更改为NaN If you want to run the same line of code multiple times, then all values need to be accounted for.如果您想多次运行同一行代码,则需要考虑所有值。 You can either use a defaultdict, which allows a default value to be set, or just include the result of the first run as inputs in case you run it a second time.您可以使用 defaultdict,它允许设置默认值,或者只包含第一次运行的结果作为输入,以防您第二次运行它。 Change改变

dfcat = dataset['label']=dataset.label.map({'ham':1,'spam':0})

to

dfcat = dataset['label']=dataset.label.map({'ham':1,'spam':0,1:1,0:0})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM