按 pandas 中的数字填充组内的 NaN 值

Question

我有一个 dataframe 如

Groups NAME VALUES 
G1     A    1
G1     B    2
G1     C    3
G1     C    3
G2     D    NaN
G2     E    NaN
G2     D    NaN 
G3     F    NaN
G3     G    NaN 
G3     H    NaN 
G4     I    8
G4     I    8
G4     J    89
G4     K    65

我只想用NaN值填充Groups并为每个不同的NAME添加一个从1开始的数字

然后我应该得到：

Groups NAME VALUES 
G1     A    1
G1     B    2
G1     C    3
G1     C    3
G2     D    1
G2     E    2
G2     D    1
G3     F    1
G3     G    2
G3     H    3
G4     I    8
G4     I    8
G4     J    89
G4     K    65

以下是数据：

{'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G2', 5: 'G2', 6: 'G2', 7: 'G3', 8: 'G3', 9: 'G3', 10: 'G4', 11: 'G4', 12: 'G4', 13: 'G4'}, 'NAME': {0: 'A', 1: 'B', 2: 'C', 3: 'C', 4: 'D', 5: 'E', 6: 'D', 7: 'F', 8: 'G', 9: 'H', 10: 'I', 11: 'I', 12: 'J', 13: 'K'}, 'VALUES': {0: 1.0, 1: 2.0, 2: 3.0, 3: 3.0, 4: nan, 5: nan, 6: nan, 7: nan, 8: nan, 9: nan, 10: 8.0, 11: 8.0, 12: 89.0, 13: 65.0}}

Answer 1

我将首先 select NaN 行的唯一名称：

m = df['VALUES'].isna()
names = df.loc[m, 'NAME'].unique()

然后为这些创建一个映射：

mapping = dict(zip(names, list(range(1,len(names)+1))))

然后使用映射填充 NaN 行的值：

df.loc[m, 'VALUES'] = df.loc[m, 'NAMES'].map(mapping)

根据我从您的评论中了解到的 GROUPS更新以填充 VALUES：

所以我们再次 select 具有 NaN VALUES 的行。 现在我们做一个 groupby 并使用变换保留原始 df 索引。 要添加列表，我们需要知道组的长度。 因此，我添加了尺寸列。

df = pd.DataFrame({'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G2', 5: 'G2', 6: 'G2', 7: 'G3', 8: 'G3', 9: 'G3', 10: 'G4', 11: 'G4', 12: 'G4', 13: 'G4'}, 'NAME': {0: 'A', 1: 'B', 2: 'C', 3: 'C', 4: 'D', 5: 'E', 6: 'D', 7: 'F', 8: 'G', 9: 'H', 10: 'I', 11: 'I', 12: 'J', 13: 'K'}, 'VALUES': {0: 1.0, 1: 2.0, 2: 3.0, 3: 3.0, 4: np.nan, 5: np.nan, 6: np.nan, 7: np.nan, 8: np.nan, 9: np.nan, 10: 8.0, 11: 8.0, 12: 89.0, 13: 65.0}})
    sizes = df.groupby(['Groups']).size()
    df['Size']=df['Groups'].map(sizes)
    m = df['VALUES'].isna()

接下来，您要重复出现 Group 和 NAME（因此 Group 和 NAME 上的 groupby）相同的数字（如 G2 和 D）=>因此我们 select 第一次出现这样的行，map 将这与 Group 和姓名：

df.loc[m, 'VALUES_new']  = df.loc[m].groupby(['Groups'])['Size'].transform(lambda x:list(range(1,len(x)+1)))
mapping = df.loc[m].groupby(['Groups', 'NAME'])['VALUES_new'].first().copy()
df.set_index(['Groups', 'NAME'], inplace=True)
m = df['VALUES'].isna()
df.loc[m,'VALUES'] = df.loc[m].index.map(mapping)
df.reset_index(inplace=True)
df.drop(columns=['Size', 'VALUES_new'], inplace=True)
df['VALUES']=df['VALUES'].astype(int)

只是为了看看各个组会发生什么，您可以运行以下命令：

df = pd.DataFrame({'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G2', 5: 'G2', 6: 'G2', 7: 'G3', 8: 'G3', 9: 'G3', 10: 'G4', 11: 'G4', 12: 'G4', 13: 'G4'}, 'NAME': {0: 'A', 1: 'B', 2: 'C', 3: 'C', 4: 'D', 5: 'E', 6: 'D', 7: 'F', 8: 'G', 9: 'H', 10: 'I', 11: 'I', 12: 'J', 13: 'K'}, 'VALUES': {0: 1.0, 1: 2.0, 2: 3.0, 3: 3.0, 4: np.nan, 5: np.nan, 6: np.nan, 7: np.nan, 8: np.nan, 9: np.nan, 10: 8.0, 11: 8.0, 12: 89.0, 13: 65.0}})
m = df['VALUES'].isna()
grouped = df.loc[m].groupby(['Groups']) #groupby object

for group in grouped:
    print(group[0]) # str with the group name
    dfgroup = group[1] # dataframe of the group
    values = list(range(1,len(dfgroup)+1))
    dfgroup['VALUES'] = values
    print(dfgroup)

Answer 2

尝试将每个组的名称转换为类别类型，然后获取 cat 代码并添加 1：

import numpy as np
import pandas as pd

d = {'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G2', 5: 'G2', 6: 'G2',
                7: 'G3', 8: 'G3', 9: 'G3', 10: 'G4', 11: 'G4', 12: 'G4',
                13: 'G4'},
     'NAME': {0: 'A', 1: 'B', 2: 'C', 3: 'C', 4: 'D', 5: 'E', 6: 'D', 7: 'F',
              8: 'G', 9: 'H', 10: 'I', 11: 'I', 12: 'J', 13: 'K'},
     'VALUES': {0: 1.0, 1: 2.0, 2: 3.0, 3: 3.0, 4: np.nan, 5: np.nan,
                6: np.nan, 7: np.nan, 8: np.nan, 9: np.nan, 10: 8.0,
                11: 8.0, 12: 89.0, 13: 65.0}}

df = pd.DataFrame(d)

# Mask for Where VALUES is NaN
m = df['VALUES'].isna()
# Groupby 'Groups'
df.loc[m, 'VALUES'] = df[m].groupby('Groups', as_index=False, sort=False).apply(
    # Convert 'NAME' to a category and grab the cat codes
    # add 1 to start with 1 instead of 0
    lambda g: g['NAME'].astype('category').cat.codes + 1
).values

# Convert to int to match output
df['VALUES'] = df['VALUES'].astype(int)

print(df)

df ：

   Groups NAME  VALUES
0      G1    A       1
1      G1    B       2
2      G1    C       3
3      G1    C       3
4      G2    D       1
5      G2    E       2
6      G2    D       1
7      G3    F       1
8      G3    G       2
9      G3    H       3
10     G4    I       8
11     G4    I       8
12     G4    J      89
13     G4    K      65

按 pandas 中的数字填充组内的 NaN 值

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-05-15 10:24:12

解决方案2
1 2021-05-15 12:09:55

按 pandas 中的数字填充组内的 NaN 值

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-05-15 10:24:12

解决方案2 1 2021-05-15 12:09:55

解决方案1
2 已采纳 2021-05-15 10:24:12

解决方案2
1 2021-05-15 12:09:55