![](/img/trans.png)
[英]Pandas: Filling column in dataset with data from another dataset based on matching columns in the two datasets
[英]how do I create a new column in my main data frame filling in the values from a smaller dataset based on two columns they have in common?
我有一個超過 10 000 行的時間序列數據集。 我需要創建一個新列,並且該列的值在 200 行的不同數據集中可用。 這些數據集之間的相似之處在於年份和國家列。
到目前為止,我已經嘗試過:
put = []
for column , values in zip(trial['country'],trial['year']):
for col , val in zip(help_df['countries'],help_df['values%']):
if col == column:
put.append(val)
但它返回錯誤的值,因為我不知道如何在嵌套循環中使用if 語句來表示“年”
您可以使用pd.merge
:
data1 = [ { "country": "argentina", "year": 2019, "value": "0.6%" }, { "country": "chile", "year": 2018, "value": "0.5%" }, { "country": "argentina", "year": 2020, "value": "0.7%" }, { "country": "chile", "year": 2017, "value": "0.5%" }, { "country": "argentina", "year": 2020, "value": "0.7%" }, { "country": "bolivia", "year": 2019, "value": "0.80%" } ]
data2 = [ { "country": "argentina", "year": 2020 }, { "country": "bolivia", "year": 2019 }, { "country": "chile", "year": 2016 }, { "country": "argentina", "year": 2019 }, { "country": "uruguay", "year": 2020 }, { "country": "bolivia", "year": 2018 } ]
values_df = pd.DataFrame(data1)
main_df = pd.DataFrame(data2)
pd.merge(main_df, values_df, how="outer", on=["year", "country"])
Output:
| | country | year | value |
|---:|:----------|-------:|:--------|
| 0 | argentina | 2020 | 0.7% |
| 1 | argentina | 2020 | 0.7% |
| 2 | bolivia | 2019 | 0.80% |
| 3 | chile | 2016 | nan |
| 4 | argentina | 2019 | 0.6% |
| 5 | uruguay | 2020 | nan |
| 6 | bolivia | 2018 | nan |
| 7 | chile | 2018 | 0.5% |
| 8 | chile | 2017 | 0.5% |
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.