简体   繁体   中英

Assign colors to specific values in a data.frame R - spot the error

I have a data.frame called color with sample names. I want to assign colors to it according to the ending .U1 or .U2 .

color
samples  
1  30HB.U2 
2  41ML.U2 
3  22WS.U1 
4  29MK.U1 
5  29MK.U2 
6  40WA.U1 
7  30HB.U1 
8  13BS.U1 
9  50DM.U1 
10 53BD.U1 
11 36ER.U1 
12 05AP.U1 
13 06WT.U1 
14 07RW.U1 
15 07RW.U2 
16 17SK.U1 
17 26FB.U1 
18 28HM.U1 
19 31KE.U1 
20 32FG.U1 
21 34WF.U1 
22 37SD.U1 
23 41ML.U1 
24 45GL.U2 
25 47OT.U1 
26 49RJ.U1 
27 54SL.U1 
28 54SL.U2 
29 69HL.U1 
30 69HL.U2 
[...]

color <- color %>%
  mutate(col = case_when(
    samples == color$samples[grepl(color$samples,pattern = '.U1') == TRUE] ~ 'red',
    samples == color$samples[grepl(color$samples,pattern = '.U2') == TRUE] ~ 'blue'))

Not every color assignment worked.

color
samples  col
1  30HB.U2 blue
2  41ML.U2 blue
3  22WS.U1 <NA>
4  29MK.U1 <NA>
14 07RW.U1 <NA>
15 07RW.U2 <NA>
16 17SK.U1 <NA>
24 45GL.U2 <NA>
25 47OT.U1 <NA>
26 49RJ.U1 <NA>
27 54SL.U1 <NA>
28 54SL.U2 <NA>
29 69HL.U1 <NA>
30 69HL.U2 <NA>
31 74SA.U1 <NA>
[...]
50 05AP.U2 <NA>
51 36ER.U2 <NA>
52 40WA.U2 <NA>
53 35AD.U2 <NA>
54 47OT.U2 <NA>
55 28HM.U2 <NA>
56 38AR.U2 <NA>
57 66DG.U2 <NA>
58 35AD.U1 <NA>
59 57MT.U2 blue
60 39DA.U2 blue
61 37SD.U2 blue
62 49RJ.U2 blue

Why does it not work? I think it is strange that the first and latter assignments work... Thank you for any suggestions

You could simply use substring and factor labels.

color <- transform(color, col=factor(substring(db$samples, 6), labels=c("red", "blue")))
color
#    samples  col
# 1  30HB.U2 blue
# 2  41ML.U2 blue
# 3  22WS.U1  red
# 4  29MK.U1  red
# 5  29MK.U2 blue
# 6  40WA.U1  red
# 7  30HB.U1  red
# 8  13BS.U1  red
# 9  50DM.U1  red
# 10 53BD.U1  red
# 11 36ER.U1  red
# 12 05AP.U1  red
# 13 06WT.U1  red
# 14 07RW.U1  red
# 15 07RW.U2 blue
# 16 17SK.U1  red
# 17 26FB.U1  red
# 18 28HM.U1  red
# 19 31KE.U1  red
# 20 32FG.U1  red
# 21 34WF.U1  red
# 22 37SD.U1  red
# 23 41ML.U1  red
# 24 45GL.U2 blue
# 25 47OT.U1  red
# 26 49RJ.U1  red
# 27 54SL.U1  red
# 28 54SL.U2 blue
# 29 69HL.U1  red
# 30 69HL.U2 blue

Data:

color <- structure(list(samples = c("30HB.U2", "41ML.U2", "22WS.U1", "29MK.U1", 
"29MK.U2", "40WA.U1", "30HB.U1", "13BS.U1", "50DM.U1", "53BD.U1", 
"36ER.U1", "05AP.U1", "06WT.U1", "07RW.U1", "07RW.U2", "17SK.U1", 
"26FB.U1", "28HM.U1", "31KE.U1", "32FG.U1", "34WF.U1", "37SD.U1", 
"41ML.U1", "45GL.U2", "47OT.U1", "49RJ.U1", "54SL.U1", "54SL.U2", 
"69HL.U1", "69HL.U2")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25", "26", "27", "28", "29", "30"))

The mutate(col = case_when(samples == ...)) structure is designed to compare individual values of samples. So as this mutate operation works, it compares each value in samples to your logical vector produced by grepl over the entire samples variable. Unintended results occurred.

Here is a way to do it using your grepl expression. Replace == with %in% since you want to check if each value of sample is one of the set compared against.

color <- color %>%
  mutate(col = case_when(
    samples %in% color$samples[grepl(color$samples,pattern = '.U1') == TRUE] ~ 'red',
    samples %in% color$samples[grepl(color$samples,pattern = '.U2') == TRUE] ~ 'blue'))

Here is a simpler way to use grepl .

color <- color %>%
  mutate(col = case_when(
    grepl(".U1", samples) ~ 'red',
    grepl(".U2", samples) ~ 'blue'))

You could also use str_detect from stringr .

library(stringr)
color <- color %>%
  mutate(col = case_when(str_detect(samples, ".U1") ~ 'red',
                         str_detect(samples, ".U1") ~ 'blue'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM