有条件地用来自不同数据框的数据填充pandas列

Question

I have a data frame (df1) that has a column, 'units', populated with either blank nan values or strings. 我有一个数据框（df1），其中有一列“单位”，其中填充了空白的nan值或字符串。

 df1 id s_type s_name s_unit 1 t1 n1 m2 2 t1 n5 m2 3 t2 n2 NaN 4 t6 n3 each

I have a second dataframe, (df2), that has similar information, though without ids. 我有第二个数据框（df2），尽管没有ID，但它具有类似的信息。

 df2 type name unit t3 n4 cm2 t4 n2 m3 t2 n2 kg t6 n0 Nan

I am struggling to develop an expression to 我正在努力表达一种

identify rows in df1 where unit is null AND 识别df1中的行，其中unit为null并且
insert unit value from df2 in to unit column of df1 WHERE 将df2中的单位值插入到df1中的单位列中
df1['type'] matches df2['type'] AND df1['name'] matches df2['name'] df1 ['type']匹配df2 ['type']，而df1 ['name']匹配df2 ['name']

In the above frames, the expression would populate the 'unit' column of df1 with the value of 'kg', as 'type' and 'name' both match. 在上述框架中，表达式将使用值“ kg”填充df1的“单位”列，因为“类型”和“名称”都匹配。

Something similar to: 类似于：

 df1.loc[df1['unit'].isnull(), 'unit'] = df2['unit'].where( (df1['name'] == df2['name']) & (df1['type'] == df2['type']))

Though the above line is producing a "ValueError: Can only compare identically-labeled Series objects." 尽管上述行产生了“ ValueError：只能比较标记相同的Series对象”。

I have looked through the documentation and other SO questions. 我浏览了文档和其他SO问题。 and am at a loss. 茫然不知所措。 Any help would be much appreciated. 任何帮助将非常感激。

Answer 1

You can use merge with left join and then combine_first or fillna : 您可以将merge与left join一起使用，然后再使用combine_first或fillna ：

df = pd.merge(df1, df2, on=['type','name'], how='left')

df1['unit'] = df1['unit'].combine_first(df['unit_y'])
print (df1)
   id type name  unit
0   1   t1   n1    m2
1   2   t1   n5    m2
2   3   t2   n2    kg
3   4   t6   n3  each

df1['unit'] = df1['unit'].fillna(df['unit_y'])
print (df1)
   id type name  unit
0   1   t1   n1    m2
1   2   t1   n5    m2
2   3   t2   n2    kg
3   4   t6   n3  each

Answer 2

You can merge first and then fill na in unit with values from df2. 您可以先合并，然后用df2中的值以单位填充na。

(
     pd.merge(df1,df2,on=['type','name'],how='left',suffixes=['','_y'])
         .assign(unit=lambda x: x.unit.combine_first(x.unit_y))
         .drop('unit_y',1)
)
Out[301]: 
   id type name  unit
0   1   t1   n1    m2
1   2   t1   n5    m2
2   3   t2   n2    kg
3   4   t6   n3  each

有条件地用来自不同数据框的数据填充pandas列

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-06-22 06:22:08

解决方案2
1 2017-06-22 06:27:20

有条件地用来自不同数据框的数据填充pandas列

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-06-22 06:22:08

解决方案2 1 2017-06-22 06:27:20

解决方案1
1 已采纳 2017-06-22 06:22:08

解决方案2
1 2017-06-22 06:27:20