简体   繁体   中英

Transposing and Merging datasets in R

I am sure there are answers out there to my question but I can't seem to find one that works and im absolutely new to R so apologies for redundancy!

So I have a huge dataset - 17K obs with 35 variables. It was a txt file which I imported and coerced with as.matrix. The 1st column has character values and the rest 34 columns has numeric values.

Structure -

>str(data_m)
 chr [1:17933, 1:35] "RAB12" "TRIM52" "C1orf86" "PLAC9" "MORN3" "LOC643783" "LOC389541" "OAZ2" ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:35] "Name" "X118" "X12" "X21" ...

Now there is another small long form dataset with 2 columns - id and gender.

> str(data_maleids)
'data.frame':   24 obs. of  2 variables:
 $ id    : Factor w/ 34 levels "X118","X12","X21",..: 8 23 9 19 10 7 5 4 2 30 ...
 $ gender: Factor w/ 2 levels "female","male": 2 2 2 2 2 2 2 2 2 2 ...`

Eg. -

    row.names   id  gender
1   1           X37 male
2   2           X64 male

All I want to do is subset the 1st dataset for only those ids ( X37, X64 etc) which are present in the 2nd dataset.

I tried transposing the bigger dataset but that gives me issues in terms of column names and I can't seem to get my way around this.

The first comment is about your statement "The 1st column has character values and the rest 34 columns has numeric values". data_m is a matrix, so all columns are of the same type. In this case character. You can see it from the output of the str() . Think about a matrix in R as a vector which is arranged in several columns.

Secondly I advise you to use data.table package (you have to install it if you do not have it yet) for this task. The sketch of the syntax would be something like this:

  1. Read the data in. There is a nice function fread() in the data.table package to read data from text files as a data.table object: data_m <- fread("file.name.txt")
  2. Key the data_m by variable id : setkey(data_m, id)
  3. Make a vector of ids from the data_maleids : ids <- sort(unique(data_maleids$id)) .
  4. Select the case you need: data_m[id %in% ids] .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM