簡體   English   中英

如何在我的主數據框中創建一個新列,根據它們共有的兩列填充較小數據集中的值?

[英]how do I create a new column in my main data frame filling in the values from a smaller dataset based on two columns they have in common?

我有一個超過 10 000 行的時間序列數據集。 我需要創建一個新列,並且該列的值在 200 行的不同數據集中可用。 這些數據集之間的相似之處在於年份和國家列。

我想在它存在時分配值,並且當它不返回 NA

到目前為止,我已經嘗試過:

put = []
for column , values in zip(trial['country'],trial['year']):
  for col , val in zip(help_df['countries'],help_df['values%']):
    if col == column:
      put.append(val)
 

但它返回錯誤的值,因為我不知道如何在嵌套循環中使用if 語句來表示“年”

您可以使用pd.merge

data1 = [ { "country": "argentina", "year": 2019, "value": "0.6%" }, { "country": "chile", "year": 2018, "value": "0.5%" }, { "country": "argentina", "year": 2020, "value": "0.7%" }, { "country": "chile", "year": 2017, "value": "0.5%" }, { "country": "argentina", "year": 2020, "value": "0.7%" }, { "country": "bolivia", "year": 2019, "value": "0.80%" } ]
data2 = [ { "country": "argentina", "year": 2020 }, { "country": "bolivia", "year": 2019 }, { "country": "chile", "year": 2016 }, { "country": "argentina", "year": 2019 }, { "country": "uruguay", "year": 2020 }, { "country": "bolivia", "year": 2018 } ]
values_df = pd.DataFrame(data1)
main_df = pd.DataFrame(data2)
pd.merge(main_df, values_df, how="outer", on=["year", "country"])

Output:

|    | country   |   year | value   |
|---:|:----------|-------:|:--------|
|  0 | argentina |   2020 | 0.7%    |
|  1 | argentina |   2020 | 0.7%    |
|  2 | bolivia   |   2019 | 0.80%   |
|  3 | chile     |   2016 | nan     |
|  4 | argentina |   2019 | 0.6%    |
|  5 | uruguay   |   2020 | nan     |
|  6 | bolivia   |   2018 | nan     |
|  7 | chile     |   2018 | 0.5%    |
|  8 | chile     |   2017 | 0.5%    |

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM