[英]Pandas/Python: Set value of new column based on row value and other DataFrame
Is it possible to add a value in a column when the province name of second dataframe matches with the province name of the first dataframe?当第二个dataframe的省份名称与第一个dataframe的省份名称匹配时,是否可以在列中添加值? I searched for answers and weren't able to find anything useful for my case.我搜索了答案,但找不到任何对我的案子有用的东西。
This is first DataFrame这是第一个 DataFrame
date province confirmed released deceased
0 2020-03-30 Daegu 6624 3837 111
1 2020-03-30 Gyeongsangbuk-do 1298 772 38
2 2020-03-30 Gyeonggi-do 463 160 5
3 2020-03-30 Seoul 426 92 0
4 2020-03-30 Chungcheongnam-do 127 83 0
...
and this is second DataFrame这是第二个 DataFrame
code province latitude longitude
0 12000 Daegu 35.872150 128.601783
1 60000 Gyeongsangbuk-do 36.576032 128.505599
2 20000 Gyeonggi-do 37.275119 127.009466
3 10000 Seoul 37.566953 126.977977
4 41000 Chungcheongnam-do 36.658976 126.673318
...
I would like to turn the first DataFrame like this.我想把第一个DataFrame这样转。
date province confirmed released deceased latitude longitude
0 2020-03-30 Daegu 6624 3837 111 35.872150 128.601783
1 2020-03-30 Gyeongsangbuk-do 1298 772 38 36.576032 128.505599
2 2020-03-30 Gyeonggi-do 463 160 5 37.275119 127.009466
3 2020-03-30 Seoul 426 92 0 37.566953 126.977977
4 2020-03-30 Chungcheongnam-do 127 83 0 36.658976 126.673318
...
Thanks!谢谢!
The pandas.DataFrame.merge method is what you want to use here. pandas.DataFrame.merge方法就是您要在此处使用的方法。
Using your example DataFrames:使用您的示例数据框:
import pandas as pd
df1 = pd.DataFrame(dict(
date = [
'2020-03-30','2020-03-30','2020-03-30',
'2020-03-30','2020-03-30',],
province = [
'Daegu', 'Gyeongsangbuk-do', 'Gyeonggi-do',
'Seoul', 'Chungcheongnam-do'],
confirmed = [6624, 1298, 463, 426, 127],
released = [3837, 772, 160, 92, 83],
deceased = [111, 38, 5, 0, 0],
))
df2 = pd.DataFrame(dict(
code = [12000, 60000, 20000, 10000, 41000],
province = [
'Daegu', 'Gyeongsangbuk-do', 'Gyeonggi-do',
'Seoul', 'Chungcheongnam-do'],
latitude = [
35.872150, 36.576032, 37.275119,
37.566953, 36.658976],
longitude = [
128.601783, 128.505599, 127.009466,
126.977977, 126.673318],
))
df3 = df1.merge(
df2[['province', 'latitude','longitude']],
on = 'province',
)
pd.set_option('display.max_columns', 7)
print(df3)
Output: Output:
date province confirmed released deceased latitude \
0 2020-03-30 Daegu 6624 3837 111 35.872150
1 2020-03-30 Gyeongsangbuk-do 1298 772 38 36.576032
2 2020-03-30 Gyeonggi-do 463 160 5 37.275119
3 2020-03-30 Seoul 426 92 0 37.566953
4 2020-03-30 Chungcheongnam-do 127 83 0 36.658976
longitude
0 128.601783
1 128.505599
2 127.009466
3 126.977977
4 126.673318
What you really want to do is merge both the DataFrames based on the province
column.您真正想要做的是根据province
列合并两个 DataFrame。
Make a new DataFrame which you want.制作您想要的新 DataFrame。
First run a loop on first DataFrame and add all the values in it.首先在第一个 DataFrame 上运行一个循环并添加其中的所有值。 (Leave the values for the columns which are not present) (保留不存在的列的值)
Then run a loop on second DataFrame and add the its values by comparing the value of province
to the already added value in the new DataFrame.然后在第二个 DataFrame 上运行一个循环,并通过将province
的值与新 DataFrame 中的已添加值进行比较来添加其值。
Here's an example这是一个例子
NewDataFrame新数据框
date province confirmed released deceased latitude longitude
After adding the first DataFrame添加第一个DataFrame后
date province confirmed released deceased latitude longitude
0 2020-03-30 Daegu 6624 3837 111
1 2020-03-30 Gyeongsangbuk-do 1298 772 38
2 2020-03-30 Gyeonggi-do 463 160 5
3 2020-03-30 Seoul 426 92 0
4 2020-03-30 Chungcheongnam-do 127 83 0
After adding second DataFrame添加第二个 DataFrame 后
date province confirmed released deceased latitude longitude
0 2020-03-30 Daegu 6624 3837 111 35.872150 128.601783
1 2020-03-30 Gyeongsangbuk-do 1298 772 38 36.576032 128.505599
2 2020-03-30 Gyeonggi-do 463 160 5 37.275119 127.009466
3 2020-03-30 Seoul 426 92 0 37.566953 126.977977
4 2020-03-30 Chungcheongnam-do 127 83 0 36.658976 126.673318
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.