简体   繁体   中英

Recoding data in R that is annotated in intervals

I have a data set that has depth in intervals.

Depth

0-3 

3-6

6-9

9-10

10-11

etc

The first three are in 3 unit increments and also the last five ( 60-63, 63-66, 66-69, 69-72, 72-75 ).

Because of this notation, I cannot plot the depth with my idependent variable. I want to recode the column that contains the depth intervals into the higher value. ie for 0-3 it would read as 3.

If there a short cut way to do this with the 3 unit increments and the singular increments?

I tried

df$depth <- 1:nrow(wor)

but this only gives me sequential numerics.

and when i try

df$depth <- dplyr::recode(df$depth, "1=3; 2=6; 3=9; 4:54 = 9:60; 55=63; 56=66; 57=69; 58=72; 59=75; 60=78") __________________
but I get the error -------- Warning message:
Unreplaced values treated as NA as .x is not compatible. Please specify replacements exhaustively or supply .default

Any help would be greatly appreciated. Tack sa mycket ! (swedish).

Try using regular expressions to extract the last number from those strings.

sub("^[[:digit:]]{1,}-([[:digit:]]{1,})", "\\1", "0-3")
[1] "3"
sub("^[[:digit:]]{1,}-([[:digit:]]{1,})", "\\1", "10-11")
[1] "11"

df$depth <- as.numeric(sub("^[[:digit:]]{1,}-([[:digit:]]{1,})", "\\1", df$depth))

You could use regular expressions to try to solve this:

dd <- data.frame(depth=c("0-3", "3-6", "6-9", "9-10", "10-11"), stringsAsFactors=FALSE) 
dd$max_depth <- gsub("([0-9]+)-([0-9]+)", "\\2", dd$depth)

You can use the function separate from the tidyr package

library(tidyr)
tidyr::separate(data, col_name, into = c("first_num", "second_num"), sep = "-")

Then you have two variables (columns) with each number of the interval and you can compute operations with them.

 library(dplyr)
 df %>%
   tidyr::separate(depth_var, into = c("first_num", "second_num"), sep = "-") %>%
   mutate(first_num = as.double(first_num), 
          second_num = as.double(second_num),
          intervals = abs(first_num - second_num)))

I would use the tidyr package and split the numbers by the dash in the middle

set.seed(1)
df <- data.frame(Depth = c("0-3", "3-6", "6-9", "9-12"),
                val = sample(x=4, replace = F))
library(tidyr)
df %>% 
  separate(Depth, c("start", "finish_dep"), sep = "-") %>% 
  select(-start)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM