简体   繁体   English

如何用最频繁的项目填充 dataframe 列中的 NaN?

[英]How to fill NaNs in a dataframe column with its most frequent item?

I have a pandas DataFrame with two columns: toy and color.我有一个 pandas DataFrame 有两列:玩具和颜色。 The color column includes missing values.颜色列包含缺失值。

How do I fill the missing color values with the most frequent color for that particular toy?如何用特定玩具最常见的颜色填充缺失的颜色值?

Here's the code to create a sample dataset:下面是创建示例数据集的代码:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'toy':['car'] * 4 + ['train'] * 5 + ['ball'] * 3 + ['truck'],
    'color':['red', 'blue', 'blue', nan, 'green', nan,
             'red', 'red', np, 'blue', 'red', nan, 'green']
    })

instead on nan and np you have to use np.nan而不是在 nan 和 np 上你必须使用 np.nan

>>> df = pd.DataFrame({
'toy':['car'] * 4 + ['train'] * 5 + ['ball'] * 3 + ['truck'],
'color':['red', 'blue', 'blue', np.nan, 'green', np.nan,
         'red', 'red', np.nan, 'blue', 'red', np.nan, 'green']
})
>>> df.color = df.color.fillna(method='mode')
    toy color
0   car red
1   car blue
2   car blue
3   car mode
4   train   green
5   train   mode
6   train   red
7   train   red
8   train   mode
9   ball    blue
10  ball    red
11  ball    mode
12  truck   green

To create a dataframe, we need to import pandas. Dataframe can be created using dataframe() function. The dataframe() takes one or two parameters.要创建 dataframe,我们需要导入 pandas。Dataframe 可以使用 dataframe() function 创建。dataframe() 接受一个或两个参数。 The first one is the data which is to be filled in the dataframe table.第一个是dataframe表要填的数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在数据框的列中填充连续NAN - Fill Consecutive NANs in a column of a dataframe 如何找到熊猫数据框中列中出现的最常见单词 - How to find most frequent word which comes in column in pandas dataframe 如何根据匹配值的辅助数据框的条件在主数据框的列中填充 NaN 以使用多个填充值填充 NaN - How to Fill NaNs in Column of Main Dataframe Based On Conditions Matching Secondary Dataframe of Values to Fill NaNs With Multiple Filler Values 根据上一个值在数据框列中填充NaN - Fill NaNs in dataframe column depending on last value GroupBy pandas DataFrame 并用最频繁的值填充/更新 - GroupBy pandas DataFrame and fill/update with most frequent values 如何获取具有最频繁值的数据帧? - How to fetch a dataframe with the most frequent value? 如何在分组 dataframe 的不同列中的不同值的列中找到最常见的出现? - How to find the most frequent appearence in one column for different values in a different column of a grouped dataframe? 使用模式将 NaN 替换为最常见的列字符串值时,我无法让 Python 中的 Fillna 工作 - I cannot get Fillna in Python to Work when using Mode to Replace NaNs with Most Frequent Column String Value 用dask dataframe中的每列最大值填充NaN - Fill NaNs with per-column max in dask dataframe 根据条件在另一列上填充 pandas.DataFrame 的 NaN - Fill NaNs of pandas.DataFrame based on condition over another column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM