简体   繁体   English

在R中,当两个数据帧中的某些值相等时,如何将数据帧中的某些特定列添加到另一个数据帧?

[英]In R, how can I add some specific columns from a dataframe to another dataframe when some values are equal in both dataframes?

I have two datasets which have both the same row combinations Country & Year and I would like to add some columns from one dataset to the other one in a way that the row combinations match. 我有两个数据集,它们具有相同的行组合Country和Year,我想以一种行组合匹配的方式将一个数据集中的一些列添加到另一个数据集中。

Dataset 1: 数据集1:

+----------+------+---------+---------+-----+
| Country  | Year | exports | imports | ... |
+----------+------+---------+---------+-----+
| Germany  | 2000 | 0.70    | 0.40    | ... |
| Germany  | 2001 | 0.68    | 0.41    | ... |
| Germany  | 2002 | 0.71    | 0.48    | ... |
| Germany  | 2003 | ...     | ...     | ... |
| Spain    | 2000 | 0.51    | 0.56    | ... |
| Spain    | 2001 | 0.48    | 0.50    | ... |
| Spain    | 2002 | 0.50    | 0.53    | ... |
| Spain    | 2003 | ...     | ...     | ... |
| ...      | ...  | ...     | ...     | ... |
+----------+------+---------+---------+-----+

Dataset 2: 数据集2:

+----------+-----+------+--------------+-------+-----+
| Country  | CC  | Year | unemployment | Pop   | ... |
+----------+-----+------+--------------+-------+-----+
| Germany  | GER | 2000 | 0.03         | 79.50 | ... |
| Germany  | GER | 2001 | 0.05         | 79.53 | ... |
| Germany  | GER | 2002 | 0.04         | 79.80 | ... |
| Germany  | GER | 2003 | ...          | ...   | ... |
| Hungary  | HUN | 2000 | ...          | ...   | ... |
| Hungary  | HUN | 2001 | ...          | ...   | ... |
| Hungary  | HUN | 2002 | ...          | ...   | ... |
| Hungary  | HUN | 2003 | ...          | ...   | ... |
| Spain    | ESP | 2000 | 0.08         | 40.2  | ... |
| Spain    | ESP | 2001 | 0.11         | 40.5  | ... |
| Spain    | ESP | 2002 | 0.10         | 40.55 | ... |
| Spain    | ESP | 2003 | ...          | ...   | ... |
| ...      | ... | ...  | ...          | ...   | ... |
+----------+-----+------+--------------+-------+-----+

I want the merged data to look like this: 我希望合并的数据看起来像这样:


+----------+-----+------+---------+---------+--------------+-------+-----+
| Country  | CC  | Year | exports | imports | unemployment | Pop   | ... |
+----------+-----+------+---------+---------+--------------+-------+-----+
| Germany  | GER | 2000 | 0.70    | 0.40    | 0.03         | 79.50 | ... |
| Germany  | GER | 2001 | 0.68    | 0.41    | 0.05         | 79.53 | ... |
| Germany  | GER | 2002 | 0.71    | 0.48    | 0.04         | 79.80 | ... |
| Germany  | GER | 2003 | ...     | ...     | ...          | ...   | ... |
| Spain    | ESP | 2000 | 0.51    | 0.56    | 0.08         | 40.2  | ... |
| Spain    | ESP | 2001 | 0.48    | 0.50    | 0.11         | 40.5  | ... |
| Spain    | ESP | 2002 | 0.50    | 0.53    | 0.10         | 40.55 | ... |
| Spain    | ESP | 2003 | ...     | ...     | ...          | ...   | ... |
| ...      | ... | ...  | ...     | ...     | ...          | ...   | ... |
+----------+-----+------+---------+---------+--------------+-------+-----+

So, the countries which are not in dataset 1 (like Hungary in this case) are not in the merged dataset and the country code is also in the new dataset. 因此,不在数据集1中的国家(在本例中为匈牙利)不在合并数据集中,国家/地区代码也在新数据集中。 Could someone tell me how I can achieve this? 有人能告诉我如何实现这一目标吗? I have 28 years for about 100 countries each. 我有28年,每个约100个国家。 So using a function in which I have to specify every combination would not be handy... 因此,使用我必须指定每个组合的功能将不方便...

I tried to merge it with merge() but did not succeed since it just created hundreds of rows with the same country and year combination. 我试图将它与merge()合并,但没有成功,因为它只创建了数百个具有相同国家和年份组合的行。

merge absolutely should work for this. 合并绝对应该为此工作。 You should specify that you are merging on two columns. 您应该指定要合并两列。

merge( df1 , df2 , by=c( "Country", "Year") )

Also confirm that the class of the merging vars is the same 同时确认合并变量的类是相同的

sapply( df1[, c( "Country", "Year")] , class )
sapply( df2[, c( "Country", "Year")] , class )

confirm that the variables are spelled the same way in both data frames 确认两个数据框中的变量拼写方式相同

intersect( names( df1 ) , names( df2 ))

Finally confirm that year and country are unique in both data.frames 最后确认年份和国家在两个data.frames中都是唯一的

sum( duplicated( df1[ ,c( "Country", "Year") ] ))
sum( duplicated( df2[ ,c( "Country", "Year") ] ))

您可以使用dplyr包中的inner_join()执行此dplyr

dplyr::inner_join(df1, df2, by=c("Country", "Year"))

The answer with merge() worked! merge()的答案有效! Now I am facing the problem that eg Spain does not have any unemployment data for the year 2000. However, I still want to add all years of Spain and would like to have a NA in the unemployment column for Spain in 2000 in the merged dataset. 现在我面临的问题是,例如西班牙2000年没有任何失业数据。但是,我仍然希望增加西班牙的所有年份,并希望在2000年的合并数据集中为西班牙的失业栏增加一个NA 。 How can I achieve this? 我怎样才能做到这一点?

I tried to use merge(df1, df2, all.x = TRUE) but sometimes it just creates NA's for some reason... 我尝试使用merge(df1, df2, all.x = TRUE)但有时它只是因某种原因创建了NA ...

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R - 使用另一个 dataframe 更改 dataframe 的某些列中的值 - R - Change values in some columns of a dataframe using another dataframe 如何从R中的dataframe中的重复行中独占添加一些值? - How to exclusively add some values from duplicated rows in a dataframe in R? 如何根据来自另一个数据框列的值使用 R 改变数据框的某些值 - How to mutate some values of a dataframe based on values from another dataframe column with R 将一些分组列添加到 R 中的嵌套 dataframe - Add some grouping columns to a nested dataframe in R R:如何将一个 dataframe 的列添加到另一个? - R: How do I add columns from one dataframe to another? 如何在R中向数据框添加另一列,以显示其他两个数据框的列之间的差异? - How do I add another column to a dataframe in R that shows the difference between the columns of two other dataframes? 根据 R 中两个数据帧的两列中的匹配对名称,将值添加到另一个数据帧中的其他值 - Add values to other values in another dataframe based on matching pair names in two columns of two dataframes in R 如何计算R中数据框某些列的指数? - How to calculate the exponential in some columns of a dataframe in R? 如何根据条件将某些列从数据框合并到另一列 - How to merge some columns from a dataframe to another one by a condition 使用其他数据框的某些列创建 dataframe - Create dataframe using some columns of other dataframes
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM