In R, how can I add some specific columns from a dataframe to another dataframe when some values are equal in both dataframes?

Question

I have two datasets which have both the same row combinations Country & Year and I would like to add some columns from one dataset to the other one in a way that the row combinations match.

Dataset 1:

+----------+------+---------+---------+-----+
| Country  | Year | exports | imports | ... |
+----------+------+---------+---------+-----+
| Germany  | 2000 | 0.70    | 0.40    | ... |
| Germany  | 2001 | 0.68    | 0.41    | ... |
| Germany  | 2002 | 0.71    | 0.48    | ... |
| Germany  | 2003 | ...     | ...     | ... |
| Spain    | 2000 | 0.51    | 0.56    | ... |
| Spain    | 2001 | 0.48    | 0.50    | ... |
| Spain    | 2002 | 0.50    | 0.53    | ... |
| Spain    | 2003 | ...     | ...     | ... |
| ...      | ...  | ...     | ...     | ... |
+----------+------+---------+---------+-----+

Dataset 2:

+----------+-----+------+--------------+-------+-----+
| Country  | CC  | Year | unemployment | Pop   | ... |
+----------+-----+------+--------------+-------+-----+
| Germany  | GER | 2000 | 0.03         | 79.50 | ... |
| Germany  | GER | 2001 | 0.05         | 79.53 | ... |
| Germany  | GER | 2002 | 0.04         | 79.80 | ... |
| Germany  | GER | 2003 | ...          | ...   | ... |
| Hungary  | HUN | 2000 | ...          | ...   | ... |
| Hungary  | HUN | 2001 | ...          | ...   | ... |
| Hungary  | HUN | 2002 | ...          | ...   | ... |
| Hungary  | HUN | 2003 | ...          | ...   | ... |
| Spain    | ESP | 2000 | 0.08         | 40.2  | ... |
| Spain    | ESP | 2001 | 0.11         | 40.5  | ... |
| Spain    | ESP | 2002 | 0.10         | 40.55 | ... |
| Spain    | ESP | 2003 | ...          | ...   | ... |
| ...      | ... | ...  | ...          | ...   | ... |
+----------+-----+------+--------------+-------+-----+

I want the merged data to look like this:


+----------+-----+------+---------+---------+--------------+-------+-----+
| Country  | CC  | Year | exports | imports | unemployment | Pop   | ... |
+----------+-----+------+---------+---------+--------------+-------+-----+
| Germany  | GER | 2000 | 0.70    | 0.40    | 0.03         | 79.50 | ... |
| Germany  | GER | 2001 | 0.68    | 0.41    | 0.05         | 79.53 | ... |
| Germany  | GER | 2002 | 0.71    | 0.48    | 0.04         | 79.80 | ... |
| Germany  | GER | 2003 | ...     | ...     | ...          | ...   | ... |
| Spain    | ESP | 2000 | 0.51    | 0.56    | 0.08         | 40.2  | ... |
| Spain    | ESP | 2001 | 0.48    | 0.50    | 0.11         | 40.5  | ... |
| Spain    | ESP | 2002 | 0.50    | 0.53    | 0.10         | 40.55 | ... |
| Spain    | ESP | 2003 | ...     | ...     | ...          | ...   | ... |
| ...      | ... | ...  | ...     | ...     | ...          | ...   | ... |
+----------+-----+------+---------+---------+--------------+-------+-----+

So, the countries which are not in dataset 1 (like Hungary in this case) are not in the merged dataset and the country code is also in the new dataset. Could someone tell me how I can achieve this? I have 28 years for about 100 countries each. So using a function in which I have to specify every combination would not be handy...

I tried to merge it with merge() but did not succeed since it just created hundreds of rows with the same country and year combination.

Answer 1

merge absolutely should work for this. You should specify that you are merging on two columns.

merge( df1 , df2 , by=c( "Country", "Year") )

Also confirm that the class of the merging vars is the same

sapply( df1[, c( "Country", "Year")] , class )
sapply( df2[, c( "Country", "Year")] , class )

confirm that the variables are spelled the same way in both data frames

intersect( names( df1 ) , names( df2 ))

Finally confirm that year and country are unique in both data.frames

sum( duplicated( df1[ ,c( "Country", "Year") ] ))
sum( duplicated( df2[ ,c( "Country", "Year") ] ))

Answer 2

您可以使用dplyr包中的inner_join()执行此dplyr

dplyr::inner_join(df1, df2, by=c("Country", "Year"))

Answer 3

The answer with merge() worked! Now I am facing the problem that eg Spain does not have any unemployment data for the year 2000. However, I still want to add all years of Spain and would like to have a NA in the unemployment column for Spain in 2000 in the merged dataset. How can I achieve this?

I tried to use merge(df1, df2, all.x = TRUE) but sometimes it just creates NA's for some reason...

In R, how can I add some specific columns from a dataframe to another dataframe when some values are equal in both dataframes?

Question

3 answers

solution1
2 2019-05-06 14:07:44

solution2
1 2019-05-06 13:35:12

solution3
0 2019-05-07 23:33:16

In R, how can I add some specific columns from a dataframe to another dataframe when some values are equal in both dataframes?

Question

3 answers

solution1 2 2019-05-06 14:07:44

solution2 1 2019-05-06 13:35:12

solution3 0 2019-05-07 23:33:16

solution1
2 2019-05-06 14:07:44

solution2
1 2019-05-06 13:35:12

solution3
0 2019-05-07 23:33:16