简体   繁体   中英

Appeding and merging two datasets of unequal length in R

I am trying to add two variables to my dataset from another dataset which is different in length. I have a coralreef survey dataset for which I am missing start and end times of each dive per site and zone of survey.

Additionally I have a table containing the start and end times of each dive per site and zone:

This table repeats the wpt (site) because 2 zones are measured per site, meaning in this table each row should be unique. In my own dataset I have many more repetitions of wpt because I have several observations in the same site and zone. I need to match the unique rows of mergingdata to merge it to my fishdata returning the start and end times of the mergingdata. So I want to match and merge by "wpt" and by "zone"

this is what I have tried:

merge<- merge(fishdata, mergingdata, by="wpt", all=TRUE, sort=FALSE)

but this only merges by zone, and my output gets an extra column called zone.y - is there a way in which I can merge by the unique combination of 2 variables? "wpt" and "zone"?

Thank you!

The documentation of merge help(merge) says:

By default the data frames are merged on the columns with names they both have, but separate specifications of the columns can be given by by.x and by.y.

As you have both id columns in both data.frames, merge function will combine the data using those common columns. So, omiting the id parameter in your code should work.

merge<- merge(fishdata, mergingdata, all=TRUE, sort=FALSE)

However, you can also specify the identifier columns using by , by.x and by.y parameters as follow:

merge<- merge(fishdata, mergingdata, by=c("wpt","zone"), all=TRUE, sort=FALSE)

EDIT

Looking at your post modifications, I figured out that your data has the following structure:

fishdata <- structure(list(date = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "23.11.2014", class = "factor"), 
    entry = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "shore", class = "factor"), 
    wpt = c(2L, 2L, 2L, 2L, 2L, 2L), zone = structure(c(1L, 1L, 
    1L, 1L, 1L, 1L), .Label = "DO", class = "factor"), transect = c(1L, 
    1L, 1L, 1L, 1L, 1L), gps = c(NA, NA, NA, NA, NA, NA), surveyor = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L), .Label = "ev", class = "factor"), depth_code = c(NA, 
    NA, NA, NA, NA, NA), phase = structure(c(2L, 2L, 1L, 1L, 
    1L, 1L), .Label = c("S_PRIN", "S_STOP"), class = "factor"), 
    species = structure(c(2L, 1L, 2L, 2L, 1L, 1L), .Label = c("IP", 
    "TP"), class = "factor"), family = c(NA, NA, NA, NA, NA, 
    NA)), .Names = c("date", "entry", "wpt", "zone", "transect", 
"gps", "surveyor", "depth_code", "phase", "species", "family"
), class = "data.frame", row.names = c(NA, -6L))

mergingdata <- structure(list(start.time = c(10.34, 10.57, 10, 10.24, 9.15, 
9.39), end.time = c(10.5, 11.1, 10.2, 10.4, 9.3, 9.5), wpt = c(2L, 
2L, 3L, 3L, 4L, 4L), zone = structure(c(1L, 2L, 1L, 2L, 1L, 2L
), .Label = c("DO", "LT"), class = "factor")), .Names = c("start.time", 
"end.time", "wpt", "zone"), class = "data.frame", row.names = c(NA, 
-6L))

Assiuming that the dataset structures are correct...

> fishdata
        date entry wpt zone transect gps surveyor depth_code  phase species family
1 23.11.2014 shore   2   DO        1  NA       ev         NA S_STOP      TP     NA
2 23.11.2014 shore   2   DO        1  NA       ev         NA S_STOP      IP     NA
3 23.11.2014 shore   2   DO        1  NA       ev         NA S_PRIN      TP     NA
4 23.11.2014 shore   2   DO        1  NA       ev         NA S_PRIN      TP     NA
5 23.11.2014 shore   2   DO        1  NA       ev         NA S_PRIN      IP     NA
6 23.11.2014 shore   2   DO        1  NA       ev         NA S_PRIN      IP     NA
> mergingdata
  start.time end.time wpt zone
1      10.34     10.5   2   DO
2      10.57     11.1   2   LT
3      10.00     10.2   3   DO
4      10.24     10.4   3   LT
5       9.15      9.3   4   DO
6       9.39      9.5   4   LT

I do the merge as follow:

> merge(x = fishdata, y = mergingdata, all.x = TRUE)
  wpt zone       date entry transect gps surveyor depth_code  phase species family start.time end.time
1   2   DO 23.11.2014 shore        1  NA       ev         NA S_STOP      TP     NA      10.34     10.5
2   2   DO 23.11.2014 shore        1  NA       ev         NA S_STOP      IP     NA      10.34     10.5
3   2   DO 23.11.2014 shore        1  NA       ev         NA S_PRIN      TP     NA      10.34     10.5
4   2   DO 23.11.2014 shore        1  NA       ev         NA S_PRIN      TP     NA      10.34     10.5
5   2   DO 23.11.2014 shore        1  NA       ev         NA S_PRIN      IP     NA      10.34     10.5
6   2   DO 23.11.2014 shore        1  NA       ev         NA S_PRIN      IP     NA      10.34     10.5

Note that I use x.all=TRUE , because what we want is to have all the rows from the x object which is fishdata merged with the extra columns of the y object ( mergingdata ). All that, by using the common columns of both objects as an index.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM