简体   繁体   English

想要删除字符串中特定多字字符之前的所有字符

[英]Want to remove all characters in a string before a specific multi-word character

I have the following gastly string:我有以下 gastly 字符串:

x <- "Tomas Ceresnak (C)\nC 71 16 Elko Prairie Cowboys\n- UNK\nVratislav Bohácik (F)\nF 71 16 Wabun Huskies\n- UNK\nLuca Mullins (D)\nD 71 16 Groundbirch Rhino Chuckers\n- UNK\nDandre Carlton (F)\nF 71 16 Ebony Gothic Knights\n- UNK\nLynn Marez (F)\nF 71 16 Ebony Gothic Knights\n- UNK\nHynek Hoško (C)\nC 71 16 HC Kometa Železnice U18\n- UNK\nGlynn Shields (F)\nF 71 16 Chanhassen Nova Ocelots\n- UNK\nJeet Beals (C)\nC 71 16 Chanhassen Nova Ocelots\n- UNK\nVeit Olivarez (F)\nF 71 16 Minnesota City Electricity\n- UNK\nGregory Mason (D)\nD 71 16 McMurphy Energy\n- UNK\nElias Storck (C)\nC 71 16 SK Semla U18\n- UNK\nKnut Scheutz (C)\nC 71 16 Bogla AIF U18\n- UNK\nJonny Hendrix (F)\nF 71 16 Minnesota City Electricity\n- UNK\nDmitry Kuvayev (G)\nG 71 16 Rotor Pervomayskiy U18\n- UNK\nKofi Orona (G)\nG 71 16 Cherhill Vikes\n- UNK"

I want to remove everything before and including "Dandre Carlton (F)" which can be found at the end of the second line.我想删除之前的所有内容,包括可以在第二行末尾找到的"Dandre Carlton (F)" I'm a pretty poor coder but this is apart of a webs craping project I'm trying to implement.我是一个很差的编码员,但这是我正在尝试实施的网络抓取项目的一部分。 Essentially, my information is spread across two pages and breaks at the specific individual Dandre Carlton.本质上,我的信息分布在两页上,并在特定的个人 Dandre Carlton 处中断。 I'm then counting how many individuals occur after Dandre Carlton by using str_count("[(]",string) to get a total count of individuals, as I can identify a new individual from the occurrence of a left parenthesis.然后,我通过使用str_count("[(]",string)来计算 Dandre Carlton 之后出现的个体数量,以获得个体总数,因为我可以从左括号的出现中识别出一个新个体。

I have "Dandre Carlton (F)" stored in a variable called name and the whole string just stored in string.我将"Dandre Carlton (F)"存储在一个名为name的变量中,整个字符串仅存储在 string 中。 I've tried:我试过了:

newstring<-gsub(paste0(".*",name),"",string)

but clearly that hasn't worked for me, Again.但显然这对我没有用,再一次。 I need this to be general enough that I can paste whatever name is the divider between the two pages to count those afterwards.我需要它足够通用,以便我可以粘贴任何名称作为两页之间的分隔符,以便之后计算它们。

The result I'd like to get is我想得到的结果是

"\nF 71 16 Ebony Gothic Knights\n- UNK\nLynn Marez (F)\nF 71 16 Ebony Gothic Knights\n- UNK\nHynek Hoško (C)\nC 71 16 HC Kometa Železnice U18\n- UNK\nGlynn Shields (F)\nF 71 16 Chanhassen Nova Ocelots\n- UNK\nJeet Beals (C)\nC 71 16 Chanhassen Nova Ocelots\n- UNK\nVeit Olivarez (F)\nF 71 16 Minnesota City Electricity\n- UNK\nGregory Mason (D)\nD 71 16 McMurphy Energy\n- UNK\nElias Storck (C)\nC 71 16 SK Semla U18\n- UNK\nKnut Scheutz (C)\nC 71 16 Bogla AIF U18\n- UNK\nJonny Hendrix (F)\nF 71 16 Minnesota City Electricity\n- UNK\nDmitry Kuvayev (G)\nG 71 16 Rotor Pervomayskiy U18\n- UNK\nKofi Orona (G)\nG 71 16 Cherhill Vikes\n- UNK"

To which I'll then use:然后我将使用:

individuals<-str_count("[(]",newstring)

which gives me the number I'm after这给了我我想要的号码

If you are able to save with escaped parentheses ,如果您能够使用转义括号进行保存,

name <-  "Dandre Carlton \\(F\\)"

else use stringi .否则使用stringi

name <- stringi::stri_replace_all_regex(name, c('\\(', '\\)'), c('\\\\(', '\\\\)'), vectorize_all=F)

Then it's just那么就只是

gsub(paste0('.*', name), '', x)
[1] "\nF 71 16 Ebony Gothic Knights\n- UNK\nLynn Marez (F)\nF 71 16 Ebony Gothic Knights\n- UNK\nHynek Hoško (C)\nC 71 16 HC Kometa Železnice U18\n- UNK\nGlynn Shields (F)\nF 71 16 Chanhassen Nova Ocelots\n- UNK\nJeet Beals (C)\nC 71 16 Chanhassen Nova Ocelots\n- UNK\nVeit Olivarez (F)\nF 71 16 Minnesota City Electricity\n- UNK\nGregory Mason (D)\nD 71 16 McMurphy Energy\n- UNK\nElias Storck (C)\nC 71 16 SK Semla U18\n- UNK\nKnut Scheutz (C)\nC 71 16 Bogla AIF U18\n- UNK\nJonny Hendrix (F)\nF 71 16 Minnesota City Electricity\n- UNK\nDmitry Kuvayev (G)\nG 71 16 Rotor Pervomayskiy U18\n- UNK\nKofi Orona (G)\nG 71 16 Cherhill Vikes\n- UNK"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在单词之前(在R中)从字符串中删除字符 - Remove characters from a string BEFORE a word (in R) 如何使用 gsub 删除字符串中任意字符前后的特定字符 - How can I use gsub to remove specific characters before and after an arbitrary character in string 如何删除 R 字符串中最后一个空格之前的所有字符,但某些字符序列除外? - How to remove all characters before last whitespace in R string but with exceptions for certain character sequences? 在特定字符之后删除字符串中的所有字符? - Drop all characters in string after specific character? 删除字符串中句点之前的所有字符 - Remove all characters before a period in a string 使用正则表达式构造多词短语的字符向量以在R中使用Quanteda构建dfm - Construct a character vector of multi-word phrases using regex for building dfm using quanteda in R R 中的负向后视,多词分离 - Negative lookbehind in R with multi-word separation plot如何用quanteda多词表达 - How to plot multi-word expressions with quanteda 查找字符串中的特定字符,将其删除,将其余字符转为数字并除以5 - Find specific character in a string, remove it, turn the rest of the characters to number and divide by 5 在R中根据相对于特定字符的相对位置从字符串中删除字符 - Remove characters from string based on relative position to specific character, in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM