简体   繁体   English

熊猫在添加到新数据集时将数据转换为NaN

[英]Panda converting data to NaN when adding to a new DataSet

I´ve been trying to extract specific data from a given data set and add it in a new one in a specific set of organized column. 我一直在尝试从给定的数据集中提取特定的数据,并将其添加到特定的有组织的列集中的新数据中。 I'm doing this by reading a CSV file and using the string function. 我正在通过读取CSV文件并使用字符串函数来做到这一点。 The problem is that even though the data is extracted correctly Pandas will add the second column as NaN even though there is data stored in the affected variable, please see my code below, any idea on how to fix this? 问题是,即使正确地提取了数据,即使受影响的变量中存储了数据,Pandas也会将第二列添加为NaN,请查看下面的代码,有关如何解决此问题的任何想法?

processor=pd.DataFrame()
Hospital_beds="SH.MED.BEDS.ZS"
Mask1=data["IndicatorCode"].str.contains(Hospital_beds)
stage=data[Mask1]
Hospital_Data=stage["Value"]
Birth_Rate="SP.DYN.CBRT.IN"
Mask=data["IndicatorCode"].str.contains(Birth_Rate)
stage=data[Mask]
Birth_Data=stage["Value"]
processor["Countries"]=stage["CountryCode"]
processor["Birth Rate per 1000 people"]=Birth_Data
processor["Hospital beds per 100 people"]=Hospital_Data
processor.head(10)

The problem here is that the indices are not matching up. 这里的问题是索引不匹配。 When you initially populate the processor data frame you are using each line from the original dataframe that contained birth rate data. 最初填充processor数据框时,您正在使用包含出生率数据的原始数据框中的每一行。 These lines are different from the ones that contain the hospital beds data so when you do 这些行与包含医院病床数据的行不同,因此当您执行

processor["Hospital beds per 100 people"] = Hospital_Data

pandas will create the new column, but since there are no matching indices for the Hospital_Data in processor it will just contain null values. 熊猫将创建新列,但由于processor中没有Hospital_Data匹配索引,因此它将仅包含空值。

Probably what you first want to do is re-index the original data using the country code and the year 您可能首先要做的是使用国家/地区代码和年份对原始数据进行索引

data.set_index(['CountryCode','Year'], inplace=True)

You can then create a view of just the indicators you are interested in 然后,您可以仅创建您感兴趣的指标的视图

indicators = ['SH.MED.BEDS.ZS', 'SP.DYN.CBRT.IN'] dview = data[data.IndicatorCode.isin(indicators)]

Finally you can then pivot on the indicator code to view each indicator on the same line 最后然后可以枢转上的指示符的代码,以查看在同一行上的每个指示符

dview.pivot(columns='IndicatorCode')['Value']

But note this will still contain a lot of NaNs. 但是请注意,这仍然会包含很多NaN。 This is just because the hospital bed data is updated very infrequently (or eg in Aruba not at all). 这仅仅是因为很少更新医院的床位数据(或根本就没有在Aruba中更新)。 But you can filter these out as needed. 但是您可以根据需要将其过滤掉。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM