简体   繁体   中英

dplyr package - mutate

Good night! I'm brazilian and I don't speak english very well. I work a database with more than 10000 rows, represented by the example below.

df <- data.frame(
    PROCESS = c(180022121, 180022121, 180022105, 180022105, 180022097, 180022097, 180022097, 180022501, 180022501), 
    NAME = c("A_NONIMATO", "B_NONIMATO", "C_NONIMATO", "C_NONIMATO", "D_NONIMATO", "E_NONIMATO", "F_NONIMATO", "G_NONIMATO", "G_NONIMATO"),DATE = c("02/01/2018", "02/01/2018", "01/01/2018", "01/01/2018", "01/01/2018", "01/01/2018", "01/01/2018", "02/01/2018", "02/01/2018"), 
    CRIME = c("ART.33", "ART.33", "ART.35", "ART.33", "ART.155", "ART.155", "ART.155", "ART.157", "ART.14CP"))

HYPOTHESIS:

a) same PROCESS , different PARTNAME , same CRIME (lines 1 and 2)

b) same PROCESS , same PARTNAME , different CRIME (lines 3 and 4)

c) same PROCESS , same PARTNAME , different CRIME (lines 8 and 9)

My question concerns hypothesis c: I need to create a column CRIME2 to transfer ART.14CP turning the case into just one line… without moving to hypothesis b. Throughout the database will always be written ART.14CP .

To look like this:

df2 <- data.frame(
    PROCESS = c(180022121, 180022121, 180022105, 180022105, 180022097, 180022097, 180022097, 180022501),
    NAME = c("A_NONIMATO", "B_NONIMATO", "C_NONIMATO", "C_NONIMATO", "D_NONIMATO", "E_NONIMATO", "F_NONIMATO", "G_NONIMATO"),
    DATE = c("02/01/2018", "02/01/2018", "01/01/2018", "01/01/2018", "01/01/2018", "01/01/2018", "01/01/2018", "02/01/2018"),
    CRIME = c("ART.33", "ART.33", "ART.35", "ART.33", "ART.155", "ART.155", "ART.155", "ART.157"),
    CRIME2 = c("", "", "", "", "", "", "", "ART.14CP"))

Thank you very much for your attention.

If you want to only have values in CRIME2 where CRIME == "ART.14CP" one way using dplyr for each group ( PROCESS and NAME ) assign value "ART.14CP" only if it is present or else assign an empty value. We remove the duplicate row entry for "ART.14CP" using filter .

library(dplyr)

df %>%
  group_by(PROCESS, NAME) %>%
  mutate(CRIME2 = if (any(CRIME == "ART.14CP")) "ART.14CP" else "") %>%
  filter(CRIME != "ART.14CP")


#    PROCESS NAME       DATE       CRIME   CRIME1  
#      <dbl> <fct>      <fct>      <fct>   <chr>   
#1 180022121 A_NONIMATO 02/01/2018 ART.33  ""      
#2 180022121 B_NONIMATO 02/01/2018 ART.33  ""      
#3 180022105 C_NONIMATO 01/01/2018 ART.35  ""      
#4 180022105 C_NONIMATO 01/01/2018 ART.33  ""      
#5 180022097 D_NONIMATO 01/01/2018 ART.155 ""      
#6 180022097 E_NONIMATO 01/01/2018 ART.155 ""      
#7 180022097 F_NONIMATO 01/01/2018 ART.155 ""      
#8 180022501 G_NONIMATO 02/01/2018 ART.157 ART.14CP

Here's a slight modification of Ronak's answer using case_when and another way of approaching the logic. I prefer case_when to if or ifelse since it's more intuitive to others reading your code, especially if you need to use multiple conditions or have numerous cases. The TRUE condition will capture any remaining cases; if you don't put it, those cases be set to NA by default.

library(dplyr)
df %>% 
  group_by(PROCESS, NAME) %>%
  mutate(CRIME2 = case_when(
    "ART.14CP" %in% CRIME ~ "ART.14CP",
    TRUE ~ ""
  )) %>% 
  filter(CRIME != "ART.14CP")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM