[英]Fill empty columns with values from another column of another row based on an identifier
I am trying to fill a dataframe, containing repeated elements, based on an identifier.我正在尝试根据标识符填充包含重复元素的 dataframe。 My Dataframe is as follows:
我的 Dataframe 如下:
Code Value
0 SJHV
1 SJIO 96B
2 SJHV 33C
3 CPO3 22A
4 CPO3 22A
5 SJHV 33C #< -- Numbers stored as strings
6 TOY
7 TOY #< -- These aren't NaN, they are empty strings
I would like to remove the empty 'Value' rows only if a non-empty 'Value' row exists.仅当存在非空“值”行时,我才想删除空的“值”行。 To be clear, I would want my output to look like:
明确地说,我希望我的 output 看起来像:
Code Value
0 SJHV 33C
1 SJIO 96B
2 CPO3 22A
3 TOY
My attempt was as follows:我的尝试如下:
df['Value'].replace('', np.nan, inplace=True)
df2 = df.dropna(subset=['Value']).drop_duplicates('Code')
As expected, this code also drops the 'TOY' Code.正如预期的那样,此代码还删除了“玩具”代码。 Any suggestions?
有什么建议么?
The empty strings should go to the bottom if you sort them, then you can just drop duplicates.如果对它们进行排序,空字符串应该 go 到底部,然后你可以删除重复项。
import pandas as pd
df = pd.DataFrame({'Code':['SJHV','SJIO','SJHV','CPO3','CPO3','SJHV','TOY','TOY'],'Value':['','96B','33C','22A','22A','33C','','']})
df = (
df.sort_values(by=['Value'], ascending=False)
.drop_duplicates(subset=['Code'], keep='first')
.sort_index()
)
Output Output
Code Value
1 SJIO 96B
2 SJHV 33C
3 CPO3 22A
6 TOY
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.