I have two data frames. One named "discovery" is like
probeID symbol is.TF entrezID
ILMN_1814092 AACSP1 FALSE 729522
ILMN_1668851 AADACL4 FALSE 343066
ILMN_1805104 ABAT FALSE 18
ILMN_2070570 ABCA10 FALSE 10349
ILMN_2232084 ABCA11P FALSE 79963
ILMN_1704579 ABCA13 FALSE 154664
ILMN_1722286 ABCA5 FALSE 23461
ILMN_1701551 ABCA6 FALSE 23460
ILMN_1743205 ABCA7 FALSE 10347
Another one named "values" is like
probeID value
ILMN_1814092 1.0
ILMN_1668851 1.9
ILMN_1805104 1.8
ILMN_2070570 1.8
ILMN_2232084 1.5
ILMN_1704579 2.3
ILMN_1722286 2.6
ILMN_1701551 0.1
ILMN_1743205 5.5
Two data frames overlap in terms of row "probeID"
How can I select from "discovery" where the "probeID" appeared in "values" ?
overlap <- discovery[values$probeID,]
It gives me a data frame with all values are NA
If you need a dataset with columns values
, merge
is the better way. But if you just need to get subset of discovery
where probeID
appears. The following works.
overlap <- discovery[discovery$probeID %in% values$probeID,]
%in%
operator is based on match, so here I am selecting only rows where discovery$probeID
matches any of values$probeID
Just do merge(discovery, values, by.x = "probeID", by.y = "probeID")
Brief explanations, see http://stat.ethz.ch/R-manual/R-devel/library/base/html/merge.html .
merge(x, y, by = intersect(names(x), names(y)),
by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,
sort = TRUE, suffixes = c(".x",".y"),
incomparables = NULL, ...)
merge x
and y
, two data frame together. If you want, you can specify the columns by which the two dataframes should be merged, using by.x
and by.y
respectively. If the column name is shared between x
and y
, then just by=
is enough. If by=
is not specified, then it is defaulted to the intersection of names(x)
and names(y)
, or the columns that are shared between the two dataframes.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.