Removing part of a string based on values from another data column

Question

I have a dataset of schools, and I want to take out the prefix in front of the schools so it just has the school name (and sometimes a number). The prefix is is also listed in another column (tipo.organización), and so I want to take the value from tipo.organización and remove it from the name of the school (nombre.establecimiento).

I tried using gsub to remove part of the string from the name, but I couldn't just pass in the column name as a set of values to change. How could I get it to go through each value and compare to the tipo.organizacion column, and then delete what is not necessary?

data <- read.csv("...", header = TRUE)
data$nombre.establecimiento <- 
as.character(data$nombre.establecimiento)

#Remove Duplicates
new <- data[!duplicated(data$nombre.establecimiento),]

#tried to take out values from other column
new$nombre.establecimiento <- gsub(new$tipo.organización, '', 
new$nombre.establecimiento)

Thank you!!

Link to dataset

Answer 1

This question has a similar problem, and a lot of good answers. The stringr approach would look something like this in your case:

new$nombre.establecimiento = str_replace_all(new$nombre.establecimiento,
                                             new$tipo.organización, '')

(I followed the link and got the raw dataset, and it's possible you may need to do some additional cleaning to get this to do what you want. I'm seeing a lot of differences between the contents of tipo.organización and the beginning of nombre.establecimiento : accented/unaccented characters, extra words, etc. You may already be doing this, of course! A link to a cleaned-up dataset would be helpful for checking this piece of the problem.)

Removing part of a string based on values from another data column

Question

1 answers

solution1
0 2019-01-21 18:03:21

Removing part of a string based on values from another data column

Question

1 answers

solution1 0 2019-01-21 18:03:21

solution1
0 2019-01-21 18:03:21