[英]Want to remove all characters in a string before a specific multi-word character
I have the following gastly string:我有以下 gastly 字符串:
x <- "Tomas Ceresnak (C)\nC 71 16 Elko Prairie Cowboys\n- UNK\nVratislav Bohácik (F)\nF 71 16 Wabun Huskies\n- UNK\nLuca Mullins (D)\nD 71 16 Groundbirch Rhino Chuckers\n- UNK\nDandre Carlton (F)\nF 71 16 Ebony Gothic Knights\n- UNK\nLynn Marez (F)\nF 71 16 Ebony Gothic Knights\n- UNK\nHynek Hoško (C)\nC 71 16 HC Kometa Železnice U18\n- UNK\nGlynn Shields (F)\nF 71 16 Chanhassen Nova Ocelots\n- UNK\nJeet Beals (C)\nC 71 16 Chanhassen Nova Ocelots\n- UNK\nVeit Olivarez (F)\nF 71 16 Minnesota City Electricity\n- UNK\nGregory Mason (D)\nD 71 16 McMurphy Energy\n- UNK\nElias Storck (C)\nC 71 16 SK Semla U18\n- UNK\nKnut Scheutz (C)\nC 71 16 Bogla AIF U18\n- UNK\nJonny Hendrix (F)\nF 71 16 Minnesota City Electricity\n- UNK\nDmitry Kuvayev (G)\nG 71 16 Rotor Pervomayskiy U18\n- UNK\nKofi Orona (G)\nG 71 16 Cherhill Vikes\n- UNK"
I want to remove everything before and including "Dandre Carlton (F)"
which can be found at the end of the second line.我想删除之前的所有内容,包括可以在第二行末尾找到的"Dandre Carlton (F)"
。 I'm a pretty poor coder but this is apart of a webs craping project I'm trying to implement.我是一个很差的编码员,但这是我正在尝试实施的网络抓取项目的一部分。 Essentially, my information is spread across two pages and breaks at the specific individual Dandre Carlton.本质上,我的信息分布在两页上,并在特定的个人 Dandre Carlton 处中断。 I'm then counting how many individuals occur after Dandre Carlton by using str_count("[(]",string)
to get a total count of individuals, as I can identify a new individual from the occurrence of a left parenthesis.然后,我通过使用str_count("[(]",string)
来计算 Dandre Carlton 之后出现的个体数量,以获得个体总数,因为我可以从左括号的出现中识别出一个新个体。
I have "Dandre Carlton (F)"
stored in a variable called name
and the whole string just stored in string.我将"Dandre Carlton (F)"
存储在一个名为name
的变量中,整个字符串仅存储在 string 中。 I've tried:我试过了:
newstring<-gsub(paste0(".*",name),"",string)
but clearly that hasn't worked for me, Again.但显然这对我没有用,再一次。 I need this to be general enough that I can paste whatever name is the divider between the two pages to count those afterwards.我需要它足够通用,以便我可以粘贴任何名称作为两页之间的分隔符,以便之后计算它们。
The result I'd like to get is我想得到的结果是
"\nF 71 16 Ebony Gothic Knights\n- UNK\nLynn Marez (F)\nF 71 16 Ebony Gothic Knights\n- UNK\nHynek Hoško (C)\nC 71 16 HC Kometa Železnice U18\n- UNK\nGlynn Shields (F)\nF 71 16 Chanhassen Nova Ocelots\n- UNK\nJeet Beals (C)\nC 71 16 Chanhassen Nova Ocelots\n- UNK\nVeit Olivarez (F)\nF 71 16 Minnesota City Electricity\n- UNK\nGregory Mason (D)\nD 71 16 McMurphy Energy\n- UNK\nElias Storck (C)\nC 71 16 SK Semla U18\n- UNK\nKnut Scheutz (C)\nC 71 16 Bogla AIF U18\n- UNK\nJonny Hendrix (F)\nF 71 16 Minnesota City Electricity\n- UNK\nDmitry Kuvayev (G)\nG 71 16 Rotor Pervomayskiy U18\n- UNK\nKofi Orona (G)\nG 71 16 Cherhill Vikes\n- UNK"
To which I'll then use:然后我将使用:
individuals<-str_count("[(]",newstring)
which gives me the number I'm after这给了我我想要的号码
If you are able to save with escaped parentheses ,如果您能够使用转义括号进行保存,
name <- "Dandre Carlton \\(F\\)"
else use stringi
.否则使用stringi
。
name <- stringi::stri_replace_all_regex(name, c('\\(', '\\)'), c('\\\\(', '\\\\)'), vectorize_all=F)
Then it's just那么就只是
gsub(paste0('.*', name), '', x)
[1] "\nF 71 16 Ebony Gothic Knights\n- UNK\nLynn Marez (F)\nF 71 16 Ebony Gothic Knights\n- UNK\nHynek Hoško (C)\nC 71 16 HC Kometa Železnice U18\n- UNK\nGlynn Shields (F)\nF 71 16 Chanhassen Nova Ocelots\n- UNK\nJeet Beals (C)\nC 71 16 Chanhassen Nova Ocelots\n- UNK\nVeit Olivarez (F)\nF 71 16 Minnesota City Electricity\n- UNK\nGregory Mason (D)\nD 71 16 McMurphy Energy\n- UNK\nElias Storck (C)\nC 71 16 SK Semla U18\n- UNK\nKnut Scheutz (C)\nC 71 16 Bogla AIF U18\n- UNK\nJonny Hendrix (F)\nF 71 16 Minnesota City Electricity\n- UNK\nDmitry Kuvayev (G)\nG 71 16 Rotor Pervomayskiy U18\n- UNK\nKofi Orona (G)\nG 71 16 Cherhill Vikes\n- UNK"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.