简体   繁体   English

合并具有不同列和行数的R DataFrame

[英]Merging R DataFrames with different number of columns and rows

I am trying to combine 2 data frames via a column known as username. 我试图通过称为用户名的列合并2个数据帧。 One data frame contains 12 variables with 1619 rows of observations. 一个数据框包含12个变量以及1619行观测值。 The other contains 37 columns with 1603 observations. 另一个包含37列,包含1603个观测值。 I'd like to match the usernames from each data set, but keep all data. 我想匹配每个数据集中的用户名,但保留所有数据。 I have tried a merge, but I always get NA for the Y set of data (unless the colname is in both sets of data). 我尝试了合并,但是对于Y数据集,我总会得到NA(除非在这两组数据中都使用了同名字符)。 Is there a way to append one set of data to another via a column name such as "username?" 是否可以通过“用户名”之类的列名将一组数据追加到另一组?

Example below: 下面的例子:

DataFrame 1
Username      HighschoolGPA     Age     Applydate
Smith, John   3.1               18      03-12-2012

DataFrame 2
Username    LiveOnCampus        Major       StudentGroup_Academic       
Smith, John  Yes                Chemistry   No              

Final DataFrame
Username HighschoolGPA Age Applydate LiveOnCampus Major StudentGroup_Academic
Smith, John 3.1         18  03-12-2012  Yes     Chemistry   No              
df1 <- data.frame(Username='Smith, John',HighschoolGPA=3.1,Age=18,Applydate='03-12-2012',stringsAsFactors=F);
df2 <- data.frame(Username='Smith, John',LiveOnCampus='Yes',Major='Chemistry',StudentGroup_Academic='No',stringsAsFactors=F);
merge(df1,df2,'Username');
##      Username HighschoolGPA Age  Applydate LiveOnCampus     Major StudentGroup_Academic
## 1 Smith, John           3.1  18 03-12-2012          Yes Chemistry                    No

You usually get NA for the Y set of the data when the merge function is matching multiple columns and generating to many unique combinations. 当合并函数匹配多列并生成许多唯一组合时,通常会获得Y数据集的NA。

Make sure the username columns are the same type, make sure they aren't factors, and specify more arguments to the merge function. 确保用户名列的类型相同,确保它们不是因素,并为合并函数指定更多参数。

Try merge(df1, df2, by = "username", all.x = TRUE, all.y = TRUE) if you would like to keep all results, matched and unmatched. 如果您想保留所有匹配和不匹配的结果,请尝试merge(df1, df2, by = "username", all.x = TRUE, all.y = TRUE)

Try merge(df1, df2, by = "username", all.x = FALSE, all.y = FALSE) if you want to keep only entries that have a matched username. 如果只想保留具有匹配用户名的条目,请尝试merge(df1, df2, by = "username", all.x = FALSE, all.y = FALSE)

Hope this helps! 希望这可以帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM