根据一列将缺失值填充到另一列

Question

I have two columns like this:我有两列这样的：

what i want to do is suppose for 'age' columns value between 30-39,i want to fill the missing value of age_band = 30. Like that suppose for 'age' columns value between 80-89,i want to fill the missing value of age_band = 80. How can i do this in pandas dataframe?我想要做的是假设'age'列值在30-39之间，我想填充age_band = 30的缺失值。就像假设'age'列值在80-89之间，我想填充缺失的age_band 的值 = 80。我如何在 pandas dataframe 中做到这一点？

I tried like this but the loop is running like forever我试过这样，但循环一直在运行

for ages in data['age']:
if 0<=ages<=9:
    data['age_band']= data['age_band'].fillna(0)
elif 10<=ages<=19:
    data['age_band']= data['age_band'].fillna(10)
elif 20<=ages<=29:
    data['age_band']= data['age_band'].fillna(20)
elif 30<=ages<=39:
    data['age_band']= data['age_band'].fillna(30)
elif 40<=ages<=49:
    data['age_band']= data['age_band'].fillna(40)
elif 50<=ages<=59:
    data['age_band']= data['age_band'].fillna(50)
elif 60<=ages<=69:
    data['age_band']= data['age_band'].fillna(60)
elif 70<=ages<=79:
    data['age_band']= data['age_band'].fillna(70)
elif 80<=ages<=89:
    data['age_band']= data['age_band'].fillna(80)
elif 90<=ages<=99:
    data['age_band']= data['age_band'].fillna(90)
elif 100<=ages<=109:
    data['age_band']= data['age_band'].fillna(100)

please help me请帮我

Answer 1

Try this shortcut:试试这个快捷方式：

data['age_band'] = data['age_band'].fillna(data['age'] // 10 * 10).astype(int)
print(data)

# Output
   age  age_band
0   93        90
1   46        40
2   50        50
3   56        50
4   89        80
5   19        10
6   25        20
7   17        10
8   54        50
9   42        40

Setup:设置：

import pandas as pd
import numpy as np

np.random.seed(2022)
data = pd.DataFrame({'age': np.random.randint(1, 111, 10), 'age_band': np.nan})
print(data)

# Output
   age  age_band
0   93       NaN
1   46       NaN
2   50       NaN
3   56       NaN
4   89       NaN
5   19       NaN
6   25       NaN
7   17       NaN
8   54       NaN
9   42       NaN

Answer 2

The above answers only work when age bins are equal you may try pd.cut which work in all scenario.上述答案仅在年龄箱相等时才有效，您可以尝试 pd.cut 在所有情况下都可以使用。

You can use labels to pd.cut() as well.您也可以对 pd.cut() 使用标签。 The following example contains the age in the range from 0-9.以下示例包含 0-9 范围内的年龄。 We're adding a new column called 'age alband' to categorize the age我们正在添加一个名为“age alband”的新列来对年龄进行分类

bins represent the intervals: 0-9 is one interval, 10-19 is one interval, and so on The corresponding labels are "0-9"etc bins代表区间：0-9为1个区间，10-19为1个区间，以此类推对应的标签为“0-9”等

bins = [0, 9,19,29,39,49,59,69,79,89,99,109]
labels = ["0-9","10-19","20-29","30-39","40-49","50-59","60-69","70-79","80-89","90-99","100-109",">109"]
data['age_band']= pd.cut(data['age'], bins=bins, labels=labels)

根据一列将缺失值填充到另一列

问题描述

2 个解决方案

解决方案1
3 已采纳 2022-01-18 16:42:06

解决方案2
0 2022-01-18 16:54:48

根据一列将缺失值填充到另一列

问题描述

2 个解决方案

解决方案1 3 已采纳 2022-01-18 16:42:06

解决方案2 0 2022-01-18 16:54:48

解决方案1
3 已采纳 2022-01-18 16:42:06

解决方案2
0 2022-01-18 16:54:48