[英]How to fill NaNs in a dataframe column with its most frequent item?
I have a pandas DataFrame with two columns: toy and color.我有一个 pandas DataFrame 有两列:玩具和颜色。 The color column includes missing values.
颜色列包含缺失值。
How do I fill the missing color values with the most frequent color for that particular toy?如何用特定玩具最常见的颜色填充缺失的颜色值?
Here's the code to create a sample dataset:下面是创建示例数据集的代码:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'toy':['car'] * 4 + ['train'] * 5 + ['ball'] * 3 + ['truck'],
'color':['red', 'blue', 'blue', nan, 'green', nan,
'red', 'red', np, 'blue', 'red', nan, 'green']
})
instead on nan and np you have to use np.nan而不是在 nan 和 np 上你必须使用 np.nan
>>> df = pd.DataFrame({
'toy':['car'] * 4 + ['train'] * 5 + ['ball'] * 3 + ['truck'],
'color':['red', 'blue', 'blue', np.nan, 'green', np.nan,
'red', 'red', np.nan, 'blue', 'red', np.nan, 'green']
})
>>> df.color = df.color.fillna(method='mode')
toy color
0 car red
1 car blue
2 car blue
3 car mode
4 train green
5 train mode
6 train red
7 train red
8 train mode
9 ball blue
10 ball red
11 ball mode
12 truck green
To create a dataframe, we need to import pandas. Dataframe can be created using dataframe() function. The dataframe() takes one or two parameters.要创建 dataframe,我们需要导入 pandas。Dataframe 可以使用 dataframe() function 创建。dataframe() 接受一个或两个参数。 The first one is the data which is to be filled in the dataframe table.
第一个是dataframe表要填的数据。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.