[英]Fill NaN values within groups by number in pandas
我有一個 dataframe 如
Groups NAME VALUES
G1 A 1
G1 B 2
G1 C 3
G1 C 3
G2 D NaN
G2 E NaN
G2 D NaN
G3 F NaN
G3 G NaN
G3 H NaN
G4 I 8
G4 I 8
G4 J 89
G4 K 65
我只想用NaN
值填充Groups
並為每個不同的NAME
添加一個從1開始的數字
然后我應該得到:
Groups NAME VALUES
G1 A 1
G1 B 2
G1 C 3
G1 C 3
G2 D 1
G2 E 2
G2 D 1
G3 F 1
G3 G 2
G3 H 3
G4 I 8
G4 I 8
G4 J 89
G4 K 65
以下是數據:
{'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G2', 5: 'G2', 6: 'G2', 7: 'G3', 8: 'G3', 9: 'G3', 10: 'G4', 11: 'G4', 12: 'G4', 13: 'G4'}, 'NAME': {0: 'A', 1: 'B', 2: 'C', 3: 'C', 4: 'D', 5: 'E', 6: 'D', 7: 'F', 8: 'G', 9: 'H', 10: 'I', 11: 'I', 12: 'J', 13: 'K'}, 'VALUES': {0: 1.0, 1: 2.0, 2: 3.0, 3: 3.0, 4: nan, 5: nan, 6: nan, 7: nan, 8: nan, 9: nan, 10: 8.0, 11: 8.0, 12: 89.0, 13: 65.0}}
我將首先 select NaN 行的唯一名稱:
m = df['VALUES'].isna()
names = df.loc[m, 'NAME'].unique()
然后為這些創建一個映射:
mapping = dict(zip(names, list(range(1,len(names)+1))))
然后使用映射填充 NaN 行的值:
df.loc[m, 'VALUES'] = df.loc[m, 'NAMES'].map(mapping)
根據我從您的評論中了解到的 GROUPS更新以填充 VALUES:
所以我們再次 select 具有 NaN VALUES 的行。 現在我們做一個 groupby 並使用變換保留原始 df 索引。 要添加列表,我們需要知道組的長度。 因此,我添加了尺寸列。
df = pd.DataFrame({'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G2', 5: 'G2', 6: 'G2', 7: 'G3', 8: 'G3', 9: 'G3', 10: 'G4', 11: 'G4', 12: 'G4', 13: 'G4'}, 'NAME': {0: 'A', 1: 'B', 2: 'C', 3: 'C', 4: 'D', 5: 'E', 6: 'D', 7: 'F', 8: 'G', 9: 'H', 10: 'I', 11: 'I', 12: 'J', 13: 'K'}, 'VALUES': {0: 1.0, 1: 2.0, 2: 3.0, 3: 3.0, 4: np.nan, 5: np.nan, 6: np.nan, 7: np.nan, 8: np.nan, 9: np.nan, 10: 8.0, 11: 8.0, 12: 89.0, 13: 65.0}})
sizes = df.groupby(['Groups']).size()
df['Size']=df['Groups'].map(sizes)
m = df['VALUES'].isna()
接下來,您要重復出現 Group 和 NAME(因此 Group 和 NAME 上的 groupby)相同的數字(如 G2 和 D)=>因此我們 select 第一次出現這樣的行,map 將這與 Group 和姓名:
df.loc[m, 'VALUES_new'] = df.loc[m].groupby(['Groups'])['Size'].transform(lambda x:list(range(1,len(x)+1)))
mapping = df.loc[m].groupby(['Groups', 'NAME'])['VALUES_new'].first().copy()
df.set_index(['Groups', 'NAME'], inplace=True)
m = df['VALUES'].isna()
df.loc[m,'VALUES'] = df.loc[m].index.map(mapping)
df.reset_index(inplace=True)
df.drop(columns=['Size', 'VALUES_new'], inplace=True)
df['VALUES']=df['VALUES'].astype(int)
只是為了看看各個組會發生什么,您可以運行以下命令:
df = pd.DataFrame({'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G2', 5: 'G2', 6: 'G2', 7: 'G3', 8: 'G3', 9: 'G3', 10: 'G4', 11: 'G4', 12: 'G4', 13: 'G4'}, 'NAME': {0: 'A', 1: 'B', 2: 'C', 3: 'C', 4: 'D', 5: 'E', 6: 'D', 7: 'F', 8: 'G', 9: 'H', 10: 'I', 11: 'I', 12: 'J', 13: 'K'}, 'VALUES': {0: 1.0, 1: 2.0, 2: 3.0, 3: 3.0, 4: np.nan, 5: np.nan, 6: np.nan, 7: np.nan, 8: np.nan, 9: np.nan, 10: 8.0, 11: 8.0, 12: 89.0, 13: 65.0}})
m = df['VALUES'].isna()
grouped = df.loc[m].groupby(['Groups']) #groupby object
for group in grouped:
print(group[0]) # str with the group name
dfgroup = group[1] # dataframe of the group
values = list(range(1,len(dfgroup)+1))
dfgroup['VALUES'] = values
print(dfgroup)
嘗試將每個組的名稱轉換為類別類型,然后獲取 cat 代碼並添加 1:
import numpy as np
import pandas as pd
d = {'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G2', 5: 'G2', 6: 'G2',
7: 'G3', 8: 'G3', 9: 'G3', 10: 'G4', 11: 'G4', 12: 'G4',
13: 'G4'},
'NAME': {0: 'A', 1: 'B', 2: 'C', 3: 'C', 4: 'D', 5: 'E', 6: 'D', 7: 'F',
8: 'G', 9: 'H', 10: 'I', 11: 'I', 12: 'J', 13: 'K'},
'VALUES': {0: 1.0, 1: 2.0, 2: 3.0, 3: 3.0, 4: np.nan, 5: np.nan,
6: np.nan, 7: np.nan, 8: np.nan, 9: np.nan, 10: 8.0,
11: 8.0, 12: 89.0, 13: 65.0}}
df = pd.DataFrame(d)
# Mask for Where VALUES is NaN
m = df['VALUES'].isna()
# Groupby 'Groups'
df.loc[m, 'VALUES'] = df[m].groupby('Groups', as_index=False, sort=False).apply(
# Convert 'NAME' to a category and grab the cat codes
# add 1 to start with 1 instead of 0
lambda g: g['NAME'].astype('category').cat.codes + 1
).values
# Convert to int to match output
df['VALUES'] = df['VALUES'].astype(int)
print(df)
df
:
Groups NAME VALUES
0 G1 A 1
1 G1 B 2
2 G1 C 3
3 G1 C 3
4 G2 D 1
5 G2 E 2
6 G2 D 1
7 G3 F 1
8 G3 G 2
9 G3 H 3
10 G4 I 8
11 G4 I 8
12 G4 J 89
13 G4 K 65
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.