[英]Fill NaN values within groups by number in pandas
I have a dataframe such as我有一个 dataframe 如
Groups NAME VALUES
G1 A 1
G1 B 2
G1 C 3
G1 C 3
G2 D NaN
G2 E NaN
G2 D NaN
G3 F NaN
G3 G NaN
G3 H NaN
G4 I 8
G4 I 8
G4 J 89
G4 K 65
And I would simply like to fill Groups
with only NaN
values and add a number for each different NAME
starting to 1我只想用NaN
值填充Groups
并为每个不同的NAME
添加一个从1开始的数字
Then I should get:然后我应该得到:
Groups NAME VALUES
G1 A 1
G1 B 2
G1 C 3
G1 C 3
G2 D 1
G2 E 2
G2 D 1
G3 F 1
G3 G 2
G3 H 3
G4 I 8
G4 I 8
G4 J 89
G4 K 65
Hereare the data:以下是数据:
{'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G2', 5: 'G2', 6: 'G2', 7: 'G3', 8: 'G3', 9: 'G3', 10: 'G4', 11: 'G4', 12: 'G4', 13: 'G4'}, 'NAME': {0: 'A', 1: 'B', 2: 'C', 3: 'C', 4: 'D', 5: 'E', 6: 'D', 7: 'F', 8: 'G', 9: 'H', 10: 'I', 11: 'I', 12: 'J', 13: 'K'}, 'VALUES': {0: 1.0, 1: 2.0, 2: 3.0, 3: 3.0, 4: nan, 5: nan, 6: nan, 7: nan, 8: nan, 9: nan, 10: 8.0, 11: 8.0, 12: 89.0, 13: 65.0}}
I would first select the unique NAMEs for the NaN rows:我将首先 select NaN 行的唯一名称:
m = df['VALUES'].isna()
names = df.loc[m, 'NAME'].unique()
then create a mapping for these:然后为这些创建一个映射:
mapping = dict(zip(names, list(range(1,len(names)+1))))
then fill your VALUES for the NaN rows with the mapping:然后使用映射填充 NaN 行的值:
df.loc[m, 'VALUES'] = df.loc[m, 'NAMES'].map(mapping)
Update to fill the VALUES based on the GROUPS as I understand from your comment:根据我从您的评论中了解到的 GROUPS更新以填充 VALUES:
So we select the rows with NaN VALUES again.所以我们再次 select 具有 NaN VALUES 的行。 Now we do a groupby and keep the original df index using transform.现在我们做一个 groupby 并使用变换保留原始 df 索引。 To add a list, we need to know the length of the group.要添加列表,我们需要知道组的长度。 I therefore added the size columns.因此,我添加了尺寸列。
df = pd.DataFrame({'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G2', 5: 'G2', 6: 'G2', 7: 'G3', 8: 'G3', 9: 'G3', 10: 'G4', 11: 'G4', 12: 'G4', 13: 'G4'}, 'NAME': {0: 'A', 1: 'B', 2: 'C', 3: 'C', 4: 'D', 5: 'E', 6: 'D', 7: 'F', 8: 'G', 9: 'H', 10: 'I', 11: 'I', 12: 'J', 13: 'K'}, 'VALUES': {0: 1.0, 1: 2.0, 2: 3.0, 3: 3.0, 4: np.nan, 5: np.nan, 6: np.nan, 7: np.nan, 8: np.nan, 9: np.nan, 10: 8.0, 11: 8.0, 12: 89.0, 13: 65.0}})
sizes = df.groupby(['Groups']).size()
df['Size']=df['Groups'].map(sizes)
m = df['VALUES'].isna()
Next you want to give a duplicate occurance of Group and NAME (so a groupby on Group and NAME) the same number (like G2 and D) => therefore we select the first occurance of such rows and map this to the combination of Group and NAME:接下来,您要重复出现 Group 和 NAME(因此 Group 和 NAME 上的 groupby)相同的数字(如 G2 和 D)=>因此我们 select 第一次出现这样的行,map 将这与 Group 和姓名:
df.loc[m, 'VALUES_new'] = df.loc[m].groupby(['Groups'])['Size'].transform(lambda x:list(range(1,len(x)+1)))
mapping = df.loc[m].groupby(['Groups', 'NAME'])['VALUES_new'].first().copy()
df.set_index(['Groups', 'NAME'], inplace=True)
m = df['VALUES'].isna()
df.loc[m,'VALUES'] = df.loc[m].index.map(mapping)
df.reset_index(inplace=True)
df.drop(columns=['Size', 'VALUES_new'], inplace=True)
df['VALUES']=df['VALUES'].astype(int)
Just to see what happens with the individual groups you could run this:只是为了看看各个组会发生什么,您可以运行以下命令:
df = pd.DataFrame({'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G2', 5: 'G2', 6: 'G2', 7: 'G3', 8: 'G3', 9: 'G3', 10: 'G4', 11: 'G4', 12: 'G4', 13: 'G4'}, 'NAME': {0: 'A', 1: 'B', 2: 'C', 3: 'C', 4: 'D', 5: 'E', 6: 'D', 7: 'F', 8: 'G', 9: 'H', 10: 'I', 11: 'I', 12: 'J', 13: 'K'}, 'VALUES': {0: 1.0, 1: 2.0, 2: 3.0, 3: 3.0, 4: np.nan, 5: np.nan, 6: np.nan, 7: np.nan, 8: np.nan, 9: np.nan, 10: 8.0, 11: 8.0, 12: 89.0, 13: 65.0}})
m = df['VALUES'].isna()
grouped = df.loc[m].groupby(['Groups']) #groupby object
for group in grouped:
print(group[0]) # str with the group name
dfgroup = group[1] # dataframe of the group
values = list(range(1,len(dfgroup)+1))
dfgroup['VALUES'] = values
print(dfgroup)
Try converting each groups' names to category type, then grab the cat codes and add 1:尝试将每个组的名称转换为类别类型,然后获取 cat 代码并添加 1:
import numpy as np
import pandas as pd
d = {'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G2', 5: 'G2', 6: 'G2',
7: 'G3', 8: 'G3', 9: 'G3', 10: 'G4', 11: 'G4', 12: 'G4',
13: 'G4'},
'NAME': {0: 'A', 1: 'B', 2: 'C', 3: 'C', 4: 'D', 5: 'E', 6: 'D', 7: 'F',
8: 'G', 9: 'H', 10: 'I', 11: 'I', 12: 'J', 13: 'K'},
'VALUES': {0: 1.0, 1: 2.0, 2: 3.0, 3: 3.0, 4: np.nan, 5: np.nan,
6: np.nan, 7: np.nan, 8: np.nan, 9: np.nan, 10: 8.0,
11: 8.0, 12: 89.0, 13: 65.0}}
df = pd.DataFrame(d)
# Mask for Where VALUES is NaN
m = df['VALUES'].isna()
# Groupby 'Groups'
df.loc[m, 'VALUES'] = df[m].groupby('Groups', as_index=False, sort=False).apply(
# Convert 'NAME' to a category and grab the cat codes
# add 1 to start with 1 instead of 0
lambda g: g['NAME'].astype('category').cat.codes + 1
).values
# Convert to int to match output
df['VALUES'] = df['VALUES'].astype(int)
print(df)
df
: df
:
Groups NAME VALUES
0 G1 A 1
1 G1 B 2
2 G1 C 3
3 G1 C 3
4 G2 D 1
5 G2 E 2
6 G2 D 1
7 G3 F 1
8 G3 G 2
9 G3 H 3
10 G4 I 8
11 G4 I 8
12 G4 J 89
13 G4 K 65
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.