简体   繁体   English

R中的部分合并数据集

[英]Partially Merging Data Sets in R

I have two data files that look like this: 我有两个看起来像这样的数据文件:

bin chrom   chromStart  chromEnd    name    score   strand
23  chr1    119537649   119537708   A_14_P109202    1000    +
109 chr1    37879762    37879821    A_16_P15088121  1000    +
129 chr1    59113425    59113484    A_16_P00074945  1000    +
138 chr1    68288459    68288517    A_16_P00088142  1000    +

and

Hybridization REF   TCGA-02-0001-01C-01D-0185-02       
Composite Element REF   normalizedLog2Ratio    
A_14_P112718    0.034472223    
A_16_P15000916  -0.038733669       
A_16_P15001074  -0.498562753       
A_16_P00000012  -0.269915751     

.

Using the names from the first column of the second file, I need to extract additional data from the data table in the first file. 使用第二个文件的第一列中的名称,我需要从第一个文件的数据表中提取其他数据。 However, not every name in the second file appears in the first. 但是,并不是第二个文件中的每个名称都出现在第一个文件中。 I am having problems getting the files to merge properly. 我在获取文件以正确合并时遇到问题。 Any help is much appreciated. 任何帮助深表感谢。

if you place all.x=TRUE in the merge command; 如果将all.x=TRUE放置在merge命令中; all of the records from the first data frame will be in the merged dataframe, even if they don't have a match in the second. 第一个数据框中的所有记录都将在合并的数据框中,即使它们在第二个中没有匹配。 Is that the problem you were encountering? 那是您遇到的问题吗? In the example that you gave none of the rownames matched any of the observations in the name variable. 在该示例中,您没有给任何行名匹配name变量中的任何观察值。

bin<-c(23,109,129,138)
chrom<-c("chr1","chr1","chr1","chr1")
chromStart<-c(119537649,37879762,59113425,68288459)  
name<-c("A_14_P109202", "A_16_P15088121", "A_16_P00074945","A_16_P00088142")
b<- data.frame(cbind(bin,chrom,chromStart,name))

y <- data.frame(c(0.034472223    ,-0.038733669 , -0.498562753 ,-0.269915751)) 
rownames(y)<-c("A_14_P112718","A_16_P15000916","A_16_P15001074","A_16_P00000012")


print(b)
print(y)

#check the rows
nrow(b)
nrow(y)

#write rownames to new variable
y$name <- rownames(y)

#conduct merge
newdataframe <- merge(b, y, by=("name"), all.x = TRUE )

#check number of rows
nrow(newdataframe)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM