[英]Removing part of a string based on values from another data column
I have a dataset of schools, and I want to take out the prefix in front of the schools so it just has the school name (and sometimes a number). 我有一个学校的数据集,我想删除学校前面的前缀,所以它只有学校名称(有时是数字)。 The prefix is is also listed in another column (tipo.organización), and so I want to take the value from tipo.organización and remove it from the name of the school (nombre.establecimiento). 该前缀也在另一列(tipo.organización)中列出,因此我想从tipo.organización中获取值并将其从学校名称中删除(nombre.establecimiento)。
I tried using gsub to remove part of the string from the name, but I couldn't just pass in the column name as a set of values to change. 我尝试使用gsub从名称中删除字符串的一部分,但是我不能只是将列名作为一组要更改的值来传递。 How could I get it to go through each value and compare to the tipo.organizacion column, and then delete what is not necessary? 我如何才能遍历每个值并将其与tipo.organizacion列进行比较,然后删除不需要的内容?
data <- read.csv("...", header = TRUE)
data$nombre.establecimiento <-
as.character(data$nombre.establecimiento)
#Remove Duplicates
new <- data[!duplicated(data$nombre.establecimiento),]
#tried to take out values from other column
new$nombre.establecimiento <- gsub(new$tipo.organización, '',
new$nombre.establecimiento)
Thank you!! 谢谢!!
This question has a similar problem, and a lot of good answers. 这个问题有一个类似的问题,并且有很多好的答案。 The stringr
approach would look something like this in your case: 在您的情况下,更stringr
方法如下所示:
new$nombre.establecimiento = str_replace_all(new$nombre.establecimiento,
new$tipo.organización, '')
(I followed the link and got the raw dataset, and it's possible you may need to do some additional cleaning to get this to do what you want. I'm seeing a lot of differences between the contents of tipo.organización
and the beginning of nombre.establecimiento
: accented/unaccented characters, extra words, etc. You may already be doing this, of course! A link to a cleaned-up dataset would be helpful for checking this piece of the problem.) (我遵循了链接并获得了原始数据集,可能您可能需要做一些额外的清理工作才能使它执行所需的操作。我发现tipo.organización
的内容与开头的内容之间存在很多差异nombre.establecimiento
:带重音/不带重音的字符,多余的单词等。当然,您可能已经在这样做了!指向清理数据集的链接将有助于检查这一部分问题。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.