[英]How to fill nan values in a data frame's column with a value from the same column when both share the equal value in another column? Ex: Where clause
I am a very beginner in python and need your help with my problem here.我是 python 的初学者,在这里需要您的帮助来解决我的问题。 I have a dataset regarding coronavirus mortality.
我有一个关于冠状病毒死亡率的数据集。 There are 2 columns Neighborhood Name (Column Name: Neighbourhood Name)which based on Postal Code Column (Column Name: NFS, and The postal code column which filled based on the Neighborhood Name column.
有 2 列 Neighborhood Name (Column Name: Neighborhood Name) 是基于 Postal Code Column (Column Name: NFS) 和基于 Neighborhood Name 列填充的邮政编码列。
I am trying to fill the Nan values in both columns.我正在尝试在两列中填充 Nan 值。
Here What I tried to do.这是我试图做的。
1 - getting the data into jupyter 1 - 将数据输入 jupyter
covid_df.head(5)
covid_df.isnull().sum().to_frame()
covid_sub_df = covid_df.loc[:, ['Neighbourhood Name', 'FSA']]
covid_sub_df
covid_sub_df_2 = covid_sub_df.drop_duplicates()
covid_sub_df_2
Now I tried This现在我尝试了这个
val = ""
for i, j in covid_df['Neighbourhood Name'], covid_df['FSA']:
for k,l in covid_sub_df_2['Neighbourhood Name'], covid_sub_df_2['FSA']:
if k == val and j == l:
covid_df['Neighbourhood Name'] = covid_sub_df['Neighbourhood Name']
if j == val and k == i:
covid_df['FSA'] = covid_sub_df['FSA']
I get this error:我收到此错误:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) in 1 val = "" ----> 2 for i, j in covid_df['Neighbourhood Name'], covid_df['FSA']: 3 for k,l in covid_sub_df_2['Neighbourhood Name'], covid_sub_df_2['FSA']: 4 if k == val and j == l: 5 covid_df['Neighbourhood Name'] = covid_sub_df['Neighbourhood Name']
-------------------------------------------------- ------------------------- ValueError Traceback (最近一次调用最后一次) in 1 val = "" ----> 2 for i, j in covid_df['社区名称'],covid_df['FSA']:3 for k,l in covid_sub_df_2['社区名称'],covid_sub_df_2['FSA']:4 如果 k == val 和 j == l:5 covid_df ['社区名称'] = covid_sub_df['社区名称']
ValueError: too many values to unpack (expected 2)
ValueError:要解包的值太多(预期为 2)
Thank You all谢谢你们
So what you need to do is get rid of the following error?那么你需要做的是摆脱以下错误?
ValueError: too many values to unpack (expected 2)
ValueError:要解包的值太多(预期为 2)
The question isn't posed very specifically because the title is how to fill nan values.这个问题并没有非常具体地提出,因为标题是如何填充 nan 值。 Also, you should try and provide a dummy data if possible
此外,如果可能,您应该尝试提供虚拟数据
However, assuming you want to get rid of the error, it is possible you wanted to simultaneously loop over the variables.但是,假设您想摆脱错误,您可能希望同时循环变量。 There is a function called as
zip()
that does that.有一个称为
zip()
的 function 可以做到这一点。 So the following modification should hopefully work:因此,以下修改应该有望起作用:
val = ""
for i, j in zip(covid_df['Neighbourhood Name'], covid_df['FSA']):
for k,l in zip(covid_sub_df_2['Neighbourhood Name'], covid_sub_df_2['FSA']):
if k == val and j == l:
covid_df['Neighbourhood Name'] = covid_sub_df['Neighbourhood Name']
if j == val and k == i:
covid_df['FSA'] = covid_sub_df['FSA']
It is not clear which values you want to fill your Nan values with.目前尚不清楚您要使用哪些值填充 Nan 值。 One option is to use pandas DataFrame replace method:
一种选择是使用 pandas DataFrame 替换方法:
covid_df.replace({np.nan : new_value})
replaces every nan value with that new_value.用该 new_value 替换每个 nan 值。 This works beacause pandas is built on top of numpy, a famous python library, and saves every Nan value as a np.nan.
这是因为 pandas 建立在 numpy 之上,这是一个著名的 python 库,并将每个 Nan 值保存为 np.nan。 You should import numpy for this to work previously:
您应该导入 numpy 以使其以前可以工作:
import numpy as np
Be aware that every Nan value will be replaced with the same exact value in the new_value variable.请注意,每个 Nan 值都将替换为 new_value 变量中的完全相同的值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.