简体   繁体   English

根据一列将缺失值填充到另一列

[英]fill missing value based on one column to another

I have two columns like this:我有两列这样的:

在此处输入图像描述

what i want to do is suppose for 'age' columns value between 30-39,i want to fill the missing value of age_band = 30. Like that suppose for 'age' columns value between 80-89,i want to fill the missing value of age_band = 80. How can i do this in pandas dataframe?我想要做的是假设'age'列值在30-39之间,我想填充age_band = 30的缺失值。就像假设'age'列值在80-89之间,我想填充缺失的age_band 的值 = 80。我如何在 pandas dataframe 中做到这一点?

I tried like this but the loop is running like forever我试过这样,但循环一直在运行

for ages in data['age']:
if 0<=ages<=9:
    data['age_band']= data['age_band'].fillna(0)
elif 10<=ages<=19:
    data['age_band']= data['age_band'].fillna(10)
elif 20<=ages<=29:
    data['age_band']= data['age_band'].fillna(20)
elif 30<=ages<=39:
    data['age_band']= data['age_band'].fillna(30)
elif 40<=ages<=49:
    data['age_band']= data['age_band'].fillna(40)
elif 50<=ages<=59:
    data['age_band']= data['age_band'].fillna(50)
elif 60<=ages<=69:
    data['age_band']= data['age_band'].fillna(60)
elif 70<=ages<=79:
    data['age_band']= data['age_band'].fillna(70)
elif 80<=ages<=89:
    data['age_band']= data['age_band'].fillna(80)
elif 90<=ages<=99:
    data['age_band']= data['age_band'].fillna(90)
elif 100<=ages<=109:
    data['age_band']= data['age_band'].fillna(100)

please help me请帮我

Try this shortcut:试试这个快捷方式:

data['age_band'] = data['age_band'].fillna(data['age'] // 10 * 10).astype(int)
print(data)

# Output
   age  age_band
0   93        90
1   46        40
2   50        50
3   56        50
4   89        80
5   19        10
6   25        20
7   17        10
8   54        50
9   42        40

Setup:设置:

import pandas as pd
import numpy as np

np.random.seed(2022)
data = pd.DataFrame({'age': np.random.randint(1, 111, 10), 'age_band': np.nan})
print(data)

# Output
   age  age_band
0   93       NaN
1   46       NaN
2   50       NaN
3   56       NaN
4   89       NaN
5   19       NaN
6   25       NaN
7   17       NaN
8   54       NaN
9   42       NaN

The above answers only work when age bins are equal you may try pd.cut which work in all scenario.上述答案仅在年龄箱相等时才有效,您可以尝试 pd.cut 在所有情况下都可以使用。

You can use labels to pd.cut() as well.您也可以对 pd.cut() 使用标签。 The following example contains the age in the range from 0-9.以下示例包含 0-9 范围内的年龄。 We're adding a new column called 'age alband' to categorize the age我们正在添加一个名为“age alband”的新列来对年龄进行分类

bins represent the intervals: 0-9 is one interval, 10-19 is one interval, and so on The corresponding labels are "0-9"etc bins代表区间:0-9为1个区间,10-19为1个区间,以此类推对应的标签为“0-9”等

bins = [0, 9,19,29,39,49,59,69,79,89,99,109]
labels = ["0-9","10-19","20-29","30-39","40-49","50-59","60-69","70-79","80-89","90-99","100-109",">109"]
data['age_band']= pd.cut(data['age'], bins=bins, labels=labels)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM