Pandas/Python：根据行值和其他 DataFrame 设置新列的值

Question

Is it possible to add a value in a column when the province name of second dataframe matches with the province name of the first dataframe?当第二个dataframe的省份名称与第一个dataframe的省份名称匹配时，是否可以在列中添加值？ I searched for answers and weren't able to find anything useful for my case.我搜索了答案，但找不到任何对我的案子有用的东西。

This is first DataFrame这是第一个 DataFrame

    date        province            confirmed   released    deceased
0   2020-03-30  Daegu               6624        3837        111
1   2020-03-30  Gyeongsangbuk-do    1298        772         38
2   2020-03-30  Gyeonggi-do         463         160         5
3   2020-03-30  Seoul               426         92          0
4   2020-03-30  Chungcheongnam-do   127         83          0
...

and this is second DataFrame这是第二个 DataFrame

    code    province            latitude    longitude
0   12000   Daegu               35.872150   128.601783   
1   60000   Gyeongsangbuk-do    36.576032   128.505599  
2   20000   Gyeonggi-do         37.275119   127.009466
3   10000   Seoul               37.566953   126.977977  
4   41000   Chungcheongnam-do   36.658976   126.673318
...

I would like to turn the first DataFrame like this.我想把第一个DataFrame这样转。

    date        province            confirmed   released    deceased   latitude     longitude
0   2020-03-30  Daegu               6624        3837        111        35.872150    128.601783
1   2020-03-30  Gyeongsangbuk-do    1298        772         38         36.576032    128.505599
2   2020-03-30  Gyeonggi-do         463         160         5          37.275119    127.009466
3   2020-03-30  Seoul               426         92          0          37.566953    126.977977
4   2020-03-30  Chungcheongnam-do   127         83          0          36.658976    126.673318
...

Thanks!谢谢！

Answer 1

The pandas.DataFrame.merge method is what you want to use here. pandas.DataFrame.merge方法就是您要在此处使用的方法。

Using your example DataFrames:使用您的示例数据框：

import pandas as pd

df1 = pd.DataFrame(dict(
    date = [
        '2020-03-30','2020-03-30','2020-03-30',
        '2020-03-30','2020-03-30',],
    province = [
        'Daegu', 'Gyeongsangbuk-do', 'Gyeonggi-do', 
        'Seoul', 'Chungcheongnam-do'],
    confirmed = [6624, 1298, 463, 426, 127],
    released = [3837, 772, 160, 92, 83],
    deceased = [111, 38, 5, 0, 0],
))

df2 = pd.DataFrame(dict(
    code = [12000, 60000, 20000, 10000, 41000],
    province = [
        'Daegu', 'Gyeongsangbuk-do', 'Gyeonggi-do', 
        'Seoul', 'Chungcheongnam-do'],
    latitude = [
        35.872150, 36.576032, 37.275119, 
        37.566953, 36.658976],
    longitude = [
        128.601783, 128.505599, 127.009466, 
        126.977977, 126.673318],
))

df3 =  df1.merge(
    df2[['province', 'latitude','longitude']],
    on = 'province',
)

pd.set_option('display.max_columns', 7)

print(df3)

Output: Output：

         date           province  confirmed  released  deceased   latitude  \
0  2020-03-30              Daegu       6624      3837       111  35.872150   
1  2020-03-30   Gyeongsangbuk-do       1298       772        38  36.576032   
2  2020-03-30        Gyeonggi-do        463       160         5  37.275119   
3  2020-03-30              Seoul        426        92         0  37.566953   
4  2020-03-30  Chungcheongnam-do        127        83         0  36.658976   

    longitude  
0  128.601783  
1  128.505599  
2  127.009466  
3  126.977977  
4  126.673318

Example Code in python tutor python 导师中的示例代码

Answer 2

What you really want to do is merge both the DataFrames based on the province column.您真正想要做的是根据province列合并两个 DataFrame。

Make a new DataFrame which you want.制作您想要的新 DataFrame。

First run a loop on first DataFrame and add all the values in it.首先在第一个 DataFrame 上运行一个循环并添加其中的所有值。 (Leave the values for the columns which are not present) （保留不存在的列的值）

Then run a loop on second DataFrame and add the its values by comparing the value of province to the already added value in the new DataFrame.然后在第二个 DataFrame 上运行一个循环，并通过将province的值与新 DataFrame 中的已添加值进行比较来添加其值。

Here's an example这是一个例子

NewDataFrame新数据框

date        province            confirmed   released    deceased   latitude     longitude

After adding the first DataFrame添加第一个DataFrame后

    date        province            confirmed   released    deceased    latitude     longitude
0   2020-03-30  Daegu               6624        3837        111
1   2020-03-30  Gyeongsangbuk-do    1298        772         38
2   2020-03-30  Gyeonggi-do         463         160         5
3   2020-03-30  Seoul               426         92          0
4   2020-03-30  Chungcheongnam-do   127         83          0

After adding second DataFrame添加第二个 DataFrame 后

    date        province            confirmed   released    deceased   latitude     longitude
0   2020-03-30  Daegu               6624        3837        111        35.872150    128.601783
1   2020-03-30  Gyeongsangbuk-do    1298        772         38         36.576032    128.505599
2   2020-03-30  Gyeonggi-do         463         160         5          37.275119    127.009466
3   2020-03-30  Seoul               426         92          0          37.566953    126.977977
4   2020-03-30  Chungcheongnam-do   127         83          0          36.658976    126.673318

Pandas/Python：根据行值和其他 DataFrame 设置新列的值

问题描述

2 个解决方案

解决方案1
4 已采纳 2020-04-10 14:08:00

解决方案2
0 2020-04-10 14:09:20

Pandas/Python：根据行值和其他 DataFrame 设置新列的值

问题描述

2 个解决方案

解决方案1 4 已采纳 2020-04-10 14:08:00

解决方案2 0 2020-04-10 14:09:20

解决方案1
4 已采纳 2020-04-10 14:08:00

解决方案2
0 2020-04-10 14:09:20