[英]Create a new row in a dataframe based only for non NaN values from a column
Lets say i have i dataframe like this one:可以说我有像这样的 dataframe :
col1 col2 col3
0 data1 Completed Fail
1 data2 Completed NaN
2 data3 Completed Completed
3 data4 Completed NaN
4 data5 NaN NaN
How can i add an extra row for each time the value in col3 is not NaN and have a dataframe like this:每次 col3 中的值不是 NaN 并且具有这样的 dataframe 时,我如何添加额外的行:
col1 status
0 data1 Completed
1 data1 Fail
2 data2 Completed
3 data3 Completed
4 data3 Completed
5 data4 Completed
6 data5 NaN
I tried this but im not getting the desirable output:我试过这个,但我没有得到理想的 output:
df = df.melt(id_vars=['col1'],
value_name="status")
IIUC, you can first start by using pd.melt()
as you already did but also drop all the null values by chaining dropna()
. IIUC,您可以首先使用
pd.melt()
开始,但也可以通过链接dropna()
删除所有 null 值。 This will get you close, but not exactly where you want to be:这将使您接近,但不完全是您想要的位置:
new = df.melt(id_vars='col1',value_name='status').sort_values(by='col1').dropna().drop('variable',axis=1)
>>> print(new)
col1 status
0 data1 Completed
5 data1 Fail
1 data2 Completed
2 data3 Completed
7 data3 Completed
3 data4 Completed
At this point, you will need to bring over the rows from your original df
that were nan
in col2.此时,您将需要从原始
df
中带入 col2 中的nan
行。 You can do that usingisnull()
andpd.concat()
respectively:您可以分别使用
isnull()
和pd.concat()
来做到这一点:
col2_nan = df.loc[df.col2.isnull()].drop('col3',axis=1).rename(columns = {'col2':'status'})
>>> print(pd.concat([new,col2_nan]).reset_index(drop=True))
col1 status
0 data1 Completed
1 data1 Fail
2 data2 Completed
3 data3 Completed
4 data3 Completed
5 data4 Completed
6 data5 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.