在 PySpark 数据框中添加不同长度的列作为新列

Question

I have this dataframe I will call DF1:我有这个数据框，我将称之为 DF1：

I have the second dataframe, DF2 (with only 3 rows):我有第二个数据框 DF2（只有 3 行）：

I want to create a new column in DF1 I will call total_population_by_year1 where:我想在 DF1 中创建一个新列，我将调用 total_population_by_year1 其中：

total_population_by_year1 = (the content of DF2 if year DF1 == Year DF2) In other words, the new column rows will be filled with the total population for each year. total_population_by_year1 = (DF2 的内容 if year DF1 == Year DF2) 换句话说，新的列行将填充每年的总人口。

What I have done so far:到目前为止我做了什么：

df_tg = DF2.join(DF1[DF1.total_population_by_year == 
DF1.Year], ["Year", "Level_One_ICD", 
"total_patient_Level1_by_year"])

This returns an error.这将返回一个错误。

Some ideas to make this work?使这项工作发挥作用的一些想法？

Answer 1

You can try this:你可以试试这个：

DF2 = DF2.toDF(['Year_2','total_population_by_year'])
DF1 = DF1.join(DF2, DF1.Year == DF2.Year_2).drop('Year_2')

在 PySpark 数据框中添加不同长度的列作为新列

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-10-05 14:46:13

在 PySpark 数据框中添加不同长度的列作为新列

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-10-05 14:46:13

解决方案1
0 已采纳 2017-10-05 14:46:13