[英]Replace NaN values with specific text based on adjacent column in pandas dataframe
My data has some NaN values in its first column.我的数据在其第一列中有一些 NaN 值。 To replace these NaN values, I want to look to the adjacent value in the next column and put a specific value as replaceNaN.
要替换这些 NaN 值,我想查看下一列中的相邻值并将特定值作为 replaceNaN。 Here is the csv data:
这是 csv 数据:
,Release,SubRelease,ReleaseDate,TypeofRelease,Package
0,LTE,TL101,2017-09-27,Major Update,2.0.4
1,LTE,TL101,2017-09-26,Normal Update,3.1
2,NaN,TL209,2017-09-25,Major Update,3.2
3,5G,5GS,2017-09-25,Delivery,1.1
4,NaN,5GM,2017-09-24,Release,1.0
5,LTE,FL18A,2017-09-23,Normal Update,3.0
For eg, there is a NaN value in the 3rd row of Release
.例如,在
Release
的第 3 行中有一个 NaN 值。 I want to look at the SubRelease
column of the same row, and say that since the value here is "TL209", I want to replace the NaN with the value "LTE".我想看看同一行的
SubRelease
列,说既然这里的值是“TL209”,我想把NaN换成值“LTE”。 Similarly, if the value in SubRelease
column is "5G19", I want to replace the NaN with "5G" for Release
.同样,如果
SubRelease
列中的值为“5G19”,我想用“5G”替换Release
的 NaN。
The first thing that comes to my mind is using regex, specify to look if the value in SubRelease
column contains or begins with the text "5G".我首先想到的是使用正则表达式,指定查看
SubRelease
列中的值是否包含或以文本“5G”开头。 But I don't know how to implement this.但我不知道如何实现这一点。 Or is there any better approach?
或者有没有更好的方法?
Easier to view csv data:更容易查看 csv 数据:
You can just do:你可以这样做:
df["Release"] = df["Release"].fillna(df["SubRelease"])
df
>>> Release SubRelease ReleaseDate TypeofRelease Package
0 LTE TL101 2017-09-27 Major Update 2.0.4
1 LTE TL101 2017-09-26 Normal Update 3.1
2 TL209 TL209 2017-09-25 Major Update 3.2
3 5G 5GS 2017-09-25 Delivery 1.1
4 5GM 5GM 2017-09-24 Release 1.0
5 LTE FL18A 2017-09-23 Normal Update 3.0
Edit I misread the question so forgot this last step:编辑我误读了这个问题,所以忘记了最后一步:
df = df.replace({"Release":{"TL209":"LTE", "5GM":"5G", "5GS":"5G", ...}})
>>> Release SubRelease ReleaseDate TypeofRelease Package
0 LTE TL101 2017-09-27 Major Update 2.0.4
1 LTE TL101 2017-09-26 Normal Update 3.1
2 LTE TL209 2017-09-25 Major Update 3.2
3 5G 5GS 2017-09-25 Delivery 1.1
4 5G 5GM 2017-09-24 Release 1.0
5 LTE FL18A 2017-09-23 Normal Update 3.0
To replace all NaNs in the Release
column depending on the values in SubRelease
, you can find eg all '5G' sub-releases and replace these NaNs first.要根据
SubRelease
中的值替换Release
列中的所有 NaN,您可以找到例如所有“5G”子版本并首先替换这些 NaN。 If there are more conditions, these can be replaced in the same way.如果有更多的条件,这些可以以相同的方式替换。 In the end, replace any remaining NaNs with the default value (here 'LTE').
最后,将所有剩余的 NaN 替换为默认值(此处为“LTE”)。
This can be done using loc
together with an appropriate mask:这可以使用
loc
和适当的掩码来完成:
df.loc[df['Release'].isna() & df['SubRelease'].str.contains('5G'), 'Release'] = '5G'
df = df.fillna('LTE')
Result:结果:
Release SubRelease ReleaseDate TypeofRelease Package
0 LTE TL101 2017-09-27 Major Update 2.0.4
1 LTE TL101 2017-09-26 Normal Update 3.1
2 LTE TL209 2017-09-25 Major Update 3.2
3 5G 5GS 2017-09-25 Delivery 1.1
4 5G 5GM 2017-09-24 Release 1.0
5 LTE FL18A 2017-09-23 Normal Update 3.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.