[英]If column value is “foo”, append dataframe with new values on the same row?
I have a dataframe containing country name, and I would like to append this with the coordinates for the capital of each country. 我有一个包含国家/地区名称的数据框,我想在此数据框后附加每个国家/地区的首都的坐标。
I created a dict with all the coordinates that's formatted like this: 我创建了一个dict,其所有坐标的格式如下:
{'Czech Republic': (14.4212535, 50.0874654), 'Zimbabwe': (31.045686, -17.831773),
'Hungary': (19.0404707, 47.4983815), 'Nigeria': (7.4892974, 9.0643305)}
I have a dataframe where a column is "COUNTRY", and want there to be two new columns "LAT", "LON" where I will store the coordinates. 我有一个数据框,其中一列为“ COUNTRY”,并希望有两个新列“ LAT”,“ LON”用于存储坐标。 I tried converting the dict to a dataframe directly but it didn't work as I wanted it to.
我尝试将dict直接转换为数据框,但没有按我希望的那样工作。
Is it viable creating an empty df with two columns "LAT", "LON", merging it with the original df and then iterating through it, checking the country and adding the coordinates one by one, or is there a better way of doing it? 创建具有两列“ LAT”,“ LON”的空df,将其与原始df合并,然后遍历它,检查国家并逐个添加坐标,是否可行?或者有更好的方法?
A country can appear many, many times in the df with about 30k entries so I'm afraid it will cause a bit of overhead. 一个国家可能会出现很多次,出现在df中的次数很多,大约有3万个条目,因此我担心这会造成一些开销。 I'm new to Pandas so I might be missing a built in feature that would work well with this.
我是Pandas的新手,所以我可能会缺少一个可以很好地使用此功能的内置功能。
Do you have any thought on the best way to approach this? 您是否对解决此问题的最佳方法有任何想法?
Thanks in advance 提前致谢
Use 2 dict comprehensions with select first and second value of tuple by indexing [0]
and [1]
with map
: 通过使用
map
索引[0]
和[1]
来选择元组的第一个和第二个值来使用2 dict理解:
d = {'Czech Republic': (14.4212535, 50.0874654), 'Zimbabwe': (31.045686, -17.831773),
'Hungary': (19.0404707, 47.4983815), 'Nigeria': (7.4892974, 9.0643305)}
df = pd.DataFrame({'COUNTRY':['Zimbabwe','Hungary', 'Slovakia']})
df['LAT'] = df['COUNTRY'].map({k:v[0] for k, v in d.items()})
df['LON'] = df['COUNTRY'].map({k:v[1] for k, v in d.items()})
print (df)
COUNTRY LAT LON
0 Zimbabwe 31.045686 -17.831773
1 Hungary 19.040471 47.498382
2 Slovakia NaN NaN
adding to the solution above, you can also use iloc 除了上述解决方案,您还可以使用iloc
d = {'Czech Republic': (14.4212535, 50.0874654), 'Zimbabwe': (31.045686, -17.831773), 'Hungary': (19.0404707, 47.4983815), 'Nigeria': (7.4892974, 9.0643305)}
d = pd.DataFrame(d)
print(d)
Czech Republic Zimbabwe Hungary Nigeria
0 14.421254 31.045686 19.040471 7.489297
1 50.087465 -17.831773 47.498382 9.064331
df = pd.DataFrame({'COUNTRY':['Zimbabwe','Hungary', 'Slovakia']})
df['LAT'] = df['COUNTRY'].map(d.iloc[0])
df['LON'] = df['COUNTRY'].map(d.iloc[1])
print(df)
COUNTRY LAT LON
0 Zimbabwe 31.045686 -17.831773
1 Hungary 19.040471 47.498382
2 Slovakia NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.