简体   繁体   中英

separate() function, library(tidyverse)

I have been using the function separate() from the library(tidyverse) to separate values into different columns:

45 (10, 89) 
34

and with the code:

dd %>% separate(a, c("x","y","z"), extra="drop") 

I got what I wanted:

45 10 89
34

But now, my variable has a different format and is not working:

45% (10,89)
34%

Why is not working when using the symbol '%'?

******Edited: Ok, I know why is not working, it is because decimal symbol in my data:

4.5% (10/89)
3.4%

6.7%

7.8% (89/98)

How do you deal with decimals with the separate function? Thank you very much!!


Thank you!

I'm inferring that when you say "is not working", it's because the percent sign is being removed:

separate(data_frame(a=c("45 (10, 89)","34")), a, c('x','y','z'), extra="drop")
# Warning: Too few values at 1 locations: 2
# # A tibble: 2 × 3
#       x     y     z
# * <chr> <chr> <chr>
# 1    45    10    89
# 2    34  <NA>  <NA>
separate(data_frame(a=c("45% (10, 89)","34%")), a, c('x','y','z'), extra="drop")
# Warning: Too few values at 1 locations: 2
# # A tibble: 2 × 3
#       x     y     z
# * <chr> <chr> <chr>
# 1    45    10    89
# 2    34        <NA>

From ?separate :

 separate(data, col, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE, extra = "warn", fill = "warn", ...) ... 

Since you are not overriding the default of sep , it finds anything that is not a letter or a number. FYI, [^[:alnum:]]+ is analogous to [^A-Za-z0-9]+ , which matches "1 or more characters that are not in the character-ranges of AZ, az, or 0-9".

Simply provide a more-detailed sep , and you'll get what you want.

separate(data_frame(a=c("45% (10, 89)","34%")), a, c('x','y','z'), sep="[^[:alnum:]%]+", extra="drop")
# Warning: Too few values at 1 locations: 2
# # A tibble: 2 × 3
#       x     y     z
# * <chr> <chr> <chr>
# 1   45%    10    89
# 2   34%  <NA>  <NA>

Edit : using your most recent sample data:

separate(data_frame(a=c("45% (10/89)","34%","","67%","78% (89/98)")), a, c('x','y','z'), sep="[^[:alnum:]%]+", extra="drop")
# Warning: Too few values at 3 locations: 2, 3, 4
# # A tibble: 5 × 3
#       x     y     z
# * <chr> <chr> <chr>
# 1   45%    10    89
# 2   34%  <NA>  <NA>
# 3        <NA>  <NA>
# 4   67%  <NA>  <NA>
# 5   78%    89    98

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM