[英]pandas replace command unable to change categorical data to numerical data
I am working on a toy dataset.我正在研究一个玩具数据集。 My dataset has 3 columns and 9 rows.
我的数据集有 3 列和 9 行。 Every column has some categorical values.
每列都有一些分类值。 I try to replace those categorical values with numerical numbers.
我尝试用数字替换这些分类值。
I am using pandas for the operation我正在使用熊猫进行操作
Code block代码块
Instance_data
Q1 Q3 Q25
2 '14 years old' 'Ungraded or other grade' No
3 '13 years old' 'Ungraded or other grade' No
4 '14 years old' 'Ungraded or other grade' No
5 '15 years old' 'Ungraded or other grade' No
6 '15 years old' 'Ungraded or other grade' No
7 '14 years old' 'Ungraded or other grade' No
8 '14 years old' 'Ungraded or other grade' No
9 '14 years old' 'Ungraded or other grade' No
10 '15 years old' 'Ungraded or other grade' No
Instance_data['Q1'].replace({
'13 years old': 1,
'14 years old': 2,
'15 years old' : 3,
}, inplace=True)
The name of the dataset is Instance_data.
数据集的名称是
Instance_data.
The output of the above query is上述查询的输出是
Q1 Q3 Q25
2 '14 years old' 'Ungraded or other grade' No
3 '13 years old' 'Ungraded or other grade' No
4 '14 years old' 'Ungraded or other grade' No
5 '15 years old' 'Ungraded or other grade' No
6 '15 years old' 'Ungraded or other grade' No
7 '14 years old' 'Ungraded or other grade' No
8 '14 years old' 'Ungraded or other grade' No
9 '14 years old' 'Ungraded or other grade' No
10 '15 years old' 'Ungraded or other grade' No
I wonder why Q1 not changed is 1,2,3?我想知道为什么Q1没变是1,2,3?
You have to use double quotes because your strings contain simple quotes:您必须使用双引号,因为您的字符串包含简单的引号:
Instance_data['Q1'].replace({
"'13 years old'": 1,
"'14 years old'": 2,
"'15 years old'" : 3,
}, inplace=True)
print(Instance_data)
# Output:
Q1 Q3 Q25
2 2 'Ungraded or other grade' No
3 1 'Ungraded or other grade' No
4 2 'Ungraded or other grade' No
5 3 'Ungraded or other grade' No
6 3 'Ungraded or other grade' No
7 2 'Ungraded or other grade' No
8 2 'Ungraded or other grade' No
9 2 'Ungraded or other grade' No
10 3 'Ungraded or other grade' No
Or you can use pd.factorize
(but not the same result)或者你可以使用
pd.factorize
(但结果不一样)
Instance_data['Q1'] = pd.factorize(Instance_data['Q1'])[0]
print(Instance_data)
# Output:
Q1 Q3 Q25
2 0 'Ungraded or other grade' No
3 1 'Ungraded or other grade' No
4 0 'Ungraded or other grade' No
5 2 'Ungraded or other grade' No
6 2 'Ungraded or other grade' No
7 0 'Ungraded or other grade' No
8 0 'Ungraded or other grade' No
9 0 'Ungraded or other grade' No
10 2 'Ungraded or other grade' No
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.