dplyr package - mutate

Question

Good night! I'm brazilian and I don't speak english very well. I work a database with more than 10000 rows, represented by the example below.

df <- data.frame(
    PROCESS = c(180022121, 180022121, 180022105, 180022105, 180022097, 180022097, 180022097, 180022501, 180022501), 
    NAME = c("A_NONIMATO", "B_NONIMATO", "C_NONIMATO", "C_NONIMATO", "D_NONIMATO", "E_NONIMATO", "F_NONIMATO", "G_NONIMATO", "G_NONIMATO"),DATE = c("02/01/2018", "02/01/2018", "01/01/2018", "01/01/2018", "01/01/2018", "01/01/2018", "01/01/2018", "02/01/2018", "02/01/2018"), 
    CRIME = c("ART.33", "ART.33", "ART.35", "ART.33", "ART.155", "ART.155", "ART.155", "ART.157", "ART.14CP"))

HYPOTHESIS:

a) same PROCESS , different PARTNAME , same CRIME (lines 1 and 2)

b) same PROCESS , same PARTNAME , different CRIME (lines 3 and 4)

c) same PROCESS , same PARTNAME , different CRIME (lines 8 and 9)

My question concerns hypothesis c: I need to create a column CRIME2 to transfer ART.14CP turning the case into just one line… without moving to hypothesis b. Throughout the database will always be written ART.14CP .

To look like this:

df2 <- data.frame(
    PROCESS = c(180022121, 180022121, 180022105, 180022105, 180022097, 180022097, 180022097, 180022501),
    NAME = c("A_NONIMATO", "B_NONIMATO", "C_NONIMATO", "C_NONIMATO", "D_NONIMATO", "E_NONIMATO", "F_NONIMATO", "G_NONIMATO"),
    DATE = c("02/01/2018", "02/01/2018", "01/01/2018", "01/01/2018", "01/01/2018", "01/01/2018", "01/01/2018", "02/01/2018"),
    CRIME = c("ART.33", "ART.33", "ART.35", "ART.33", "ART.155", "ART.155", "ART.155", "ART.157"),
    CRIME2 = c("", "", "", "", "", "", "", "ART.14CP"))

Thank you very much for your attention.

Answer 1

If you want to only have values in CRIME2 where CRIME == "ART.14CP" one way using dplyr for each group ( PROCESS and NAME ) assign value "ART.14CP" only if it is present or else assign an empty value. We remove the duplicate row entry for "ART.14CP" using filter .

library(dplyr)

df %>%
  group_by(PROCESS, NAME) %>%
  mutate(CRIME2 = if (any(CRIME == "ART.14CP")) "ART.14CP" else "") %>%
  filter(CRIME != "ART.14CP")


#    PROCESS NAME       DATE       CRIME   CRIME1  
#      <dbl> <fct>      <fct>      <fct>   <chr>   
#1 180022121 A_NONIMATO 02/01/2018 ART.33  ""      
#2 180022121 B_NONIMATO 02/01/2018 ART.33  ""      
#3 180022105 C_NONIMATO 01/01/2018 ART.35  ""      
#4 180022105 C_NONIMATO 01/01/2018 ART.33  ""      
#5 180022097 D_NONIMATO 01/01/2018 ART.155 ""      
#6 180022097 E_NONIMATO 01/01/2018 ART.155 ""      
#7 180022097 F_NONIMATO 01/01/2018 ART.155 ""      
#8 180022501 G_NONIMATO 02/01/2018 ART.157 ART.14CP

Answer 2

Here's a slight modification of Ronak's answer using case_when and another way of approaching the logic. I prefer case_when to if or ifelse since it's more intuitive to others reading your code, especially if you need to use multiple conditions or have numerous cases. The TRUE condition will capture any remaining cases; if you don't put it, those cases be set to NA by default.

library(dplyr)
df %>% 
  group_by(PROCESS, NAME) %>%
  mutate(CRIME2 = case_when(
    "ART.14CP" %in% CRIME ~ "ART.14CP",
    TRUE ~ ""
  )) %>% 
  filter(CRIME != "ART.14CP")

dplyr package - mutate

Question

2 answers

solution1
2 2019-09-16 01:24:40

solution2
1 2019-09-16 03:58:50

dplyr package - mutate

Question

2 answers

solution1 2 2019-09-16 01:24:40

solution2 1 2019-09-16 03:58:50

solution1
2 2019-09-16 01:24:40

solution2
1 2019-09-16 03:58:50