简体   繁体   中英

Using indexing to perform mathematical operations on data frame in r

I'm struggling to perform basic indexing on a data frame to perform mathematical operations. I have a data frame containing all 50 US states with an entry for each month of the year, so there are 600 observations. I wish to find the difference between a value for the month of December minus the January value for each of the states. My data looks like this:

> head(df)
  state year month             value
1    AL 2020    01               2.7
2    AK 2020    01                 5
3    AZ 2020    01               4.8
4    AR 2020    01               3.7
5    CA 2020    01               4.2
7    CO 2020    01               2.7

For instance, AL has a value in Dec of 4.7 and Jan value of 2.7 so I'd like to return 2 for that state.

I have been trying to do this with the group_by and summarize functions, but can't figure out the indexing piece of it to grab values that correspond to a condition. I couldn't find a resource for performing these mathematical operations using indexing on a data frame, and would appreciate assistance as I have other transformations I'll be using.

With dplyr :

library(dplyr)
df %>%
  group_by(state) %>%
  summarize(year_change = value[month == "12"] - value[month == "01"])

This assumes that your data is as you describe--every state has a single value for every month. If you have missing rows, or multiple observations in for a state in a given month, I would not expect this code to work.

Another approach, based row order rather than month value, might look like this:

library(dplyr)
df %>%
  ## make sure things are in the right order
  arrange(state, month) %>% 
  group_by(state) %>%
  summarize(year_change = last(value) - first(value))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM