根据 pandas dataframe 中的相邻列将 NaN 值替换为特定文本

Question

My data has some NaN values in its first column.我的数据在其第一列中有一些 NaN 值。 To replace these NaN values, I want to look to the adjacent value in the next column and put a specific value as replaceNaN.要替换这些 NaN 值，我想查看下一列中的相邻值并将特定值作为 replaceNaN。 Here is the csv data:这是 csv 数据：

,Release,SubRelease,ReleaseDate,TypeofRelease,Package
0,LTE,TL101,2017-09-27,Major Update,2.0.4
1,LTE,TL101,2017-09-26,Normal Update,3.1
2,NaN,TL209,2017-09-25,Major Update,3.2
3,5G,5GS,2017-09-25,Delivery,1.1
4,NaN,5GM,2017-09-24,Release,1.0
5,LTE,FL18A,2017-09-23,Normal Update,3.0

For eg, there is a NaN value in the 3rd row of Release .例如，在Release的第 3 行中有一个 NaN 值。 I want to look at the SubRelease column of the same row, and say that since the value here is "TL209", I want to replace the NaN with the value "LTE".我想看看同一行的SubRelease列，说既然这里的值是“TL209”，我想把NaN换成值“LTE”。 Similarly, if the value in SubRelease column is "5G19", I want to replace the NaN with "5G" for Release .同样，如果SubRelease列中的值为“5G19”，我想用“5G”替换Release的 NaN。

The first thing that comes to my mind is using regex, specify to look if the value in SubRelease column contains or begins with the text "5G".我首先想到的是使用正则表达式，指定查看SubRelease列中的值是否包含或以文本“5G”开头。 But I don't know how to implement this.但我不知道如何实现这一点。 Or is there any better approach?或者有没有更好的方法？

Easier to view csv data:更容易查看 csv 数据：

Answer 1

You can just do:你可以这样做：

df["Release"] = df["Release"].fillna(df["SubRelease"])
df
>>>   Release SubRelease ReleaseDate  TypeofRelease Package
0     LTE      TL101  2017-09-27   Major Update   2.0.4
1     LTE      TL101  2017-09-26  Normal Update     3.1
2   TL209      TL209  2017-09-25   Major Update     3.2
3      5G        5GS  2017-09-25       Delivery     1.1
4     5GM        5GM  2017-09-24        Release     1.0
5     LTE      FL18A  2017-09-23  Normal Update     3.0

Edit I misread the question so forgot this last step:编辑我误读了这个问题，所以忘记了最后一步：

df = df.replace({"Release":{"TL209":"LTE", "5GM":"5G", "5GS":"5G", ...}})
>>>   Release SubRelease ReleaseDate  TypeofRelease Package
0     LTE      TL101  2017-09-27   Major Update   2.0.4
1     LTE      TL101  2017-09-26  Normal Update     3.1
2     LTE      TL209  2017-09-25   Major Update     3.2
3      5G        5GS  2017-09-25       Delivery     1.1
4      5G        5GM  2017-09-24        Release     1.0
5     LTE      FL18A  2017-09-23  Normal Update     3.0

Answer 2

To replace all NaNs in the Release column depending on the values in SubRelease , you can find eg all '5G' sub-releases and replace these NaNs first.要根据SubRelease中的值替换Release列中的所有 NaN，您可以找到例如所有“5G”子版本并首先替换这些 NaN。 If there are more conditions, these can be replaced in the same way.如果有更多的条件，这些可以以相同的方式替换。 In the end, replace any remaining NaNs with the default value (here 'LTE').最后，将所有剩余的 NaN 替换为默认值（此处为“LTE”）。

This can be done using loc together with an appropriate mask:这可以使用loc和适当的掩码来完成：

df.loc[df['Release'].isna() & df['SubRelease'].str.contains('5G'), 'Release'] = '5G'
df = df.fillna('LTE')

Result:结果：

  Release SubRelease ReleaseDate  TypeofRelease Package
0     LTE      TL101  2017-09-27   Major Update   2.0.4
1     LTE      TL101  2017-09-26  Normal Update     3.1
2     LTE      TL209  2017-09-25   Major Update     3.2
3      5G        5GS  2017-09-25       Delivery     1.1
4      5G        5GM  2017-09-24        Release     1.0
5     LTE      FL18A  2017-09-23  Normal Update     3.0

根据 pandas dataframe 中的相邻列将 NaN 值替换为特定文本

问题描述

2 个解决方案

解决方案1
3 2021-03-11 02:27:08

解决方案2
1 2021-03-11 02:41:34

根据 pandas dataframe 中的相邻列将 NaN 值替换为特定文本

问题描述

2 个解决方案

解决方案1 3 2021-03-11 02:27:08

解决方案2 1 2021-03-11 02:41:34

解决方案1
3 2021-03-11 02:27:08

解决方案2
1 2021-03-11 02:41:34