简体   繁体   中英

Dividing values in a column of a data frame by values from a different data frame when row values match

I have a data.frame x with the following format:

     species      site  count
1:         A       1.1     25
2:         A       1.2   1152
3:         A       2.1     26
4:         A       3.5      1
5:         A       3.7     98
---                         
101:       B       1.2      6
102:       B       1.3     10
103:       B       2.1      8
104:       B       2.2      8
105:       B       2.3      5

I also have another data.frame area with the following format:

      species    area
1:          A    59.7
2:          B    34.4
3:          C    37.7
4:          D    22.8

I would like to divide the count column of data.frame x by values in the area column data.frame area when the values in the species column of each data.frame match

I have been trying to make it work with a ddply function:

density = ddply(x, "species", mutate, density = x$count/area[,2]

But I can't figure out the proper index syntax of the area[] call to select only the row which matches the values found in x$species . However, I am super new to the plyr package (and apply* functions as a whole) so this may be the completely wrong approach

I'm hoping to return a data.frame of the following format:

     species      site  count   density
1:         A       1.1     25     0.419
2:         A       1.2    152     2.546
3:         A       2.1     26     0.436
4:         A       3.5      1     0.017
5:         A       3.7     98     1.641
---                         
101:       B       1.2      6     0.174
102:       B       1.3     10     0.291
103:       B       2.1      8     0.233
104:       B       2.2      8     0.233
105:       B       2.3      5     0.145

This is easy with data.table :

library(data.table)
#converting your data to the native type for the package (by reference)
setDT(x); setDT(area) 
x[area, density:=count/i.area, on="species"]

:= is the natural way to add columns in data.table ( by reference , see this vignette & particularly point b) for some more about this and why it's important), so x:=y adds a column named x to your data.table and assigns it the value y .

When merging in the form X[Y,] , we can think of Y as selecting the rows of X to operate on; further, when Y is a data.table , all objects in both X and Y are avaiable in j (ie, what comes after the comma), so we could have said density:=count/area ; when we want to be sure that we're referring to one of Y 's columns, we prepend its name with i. so that we know we're referring to one of the columns in i , ie, what precedes the comma. There should be a vignette on merges forthcoming .

In general, as soon as you think "match across different data sets" your instinct should be to merge. For more on data.table , see here .

I'd use a merge ( left_join ) then add new columns using mutate :

library(dplyr)

x %>% left_join(area, by="species") %>%
      mutate(density = count/area)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM