簡體   English   中英

在R中使用strsplit提取特定字符串

[英]Extracting a particular string using strsplit in R

使用以下方法將XMLDocument類型對象轉換為Character后:

do.call(paste, as.list(capture.output(list_links)))

我想使用strsplit從產生的字符對象中提取特定的字符串。 list_links的輸出如下。

[1] "[[1]] <a href=\"/Archive/CrossNational.asp\">Cross-National Data</a>   [[2]] <a href=\"/Archive/MultiNation.asp\">Multiple Nation Surveys</a>   [[3]] <a href=\"/Archive/IntSurveys.asp\">Single Nation Surveys</a>   [[4]] <a href=\"/Archive/ChCounty.asp\">County-Level Data</a>   [[5]] <a href=\"/Archive/ChState.asp\">State-Level Data</a>   [[6]] <a href=\"/Archive/NatBaylor.asp\">Baylor Religion Surveys</a>   [[7]] <a href=\"/Archive/GSS.asp\">General Social Surveys</a>   [[8]] <a href=\"/Archive/Polls.asp\">News Polls</a>   [[9]] <a href=\"/Archive/NES.asp\">National Election Studies</a>   [[10]] <a href=\"/Archive/NatFamily.asp\">National Survey of Family Growth</a>   [[11]] <a href=\"/Archive/NSYR.asp\">National Studies of Youth and Religion (NSYR)</a>   [[12]] <a href=\"/Archive/PewResearch.asp\">Pew Research Center</a>   [[13]] <a href=\"/Archive/PALS.asp\">Portraits of American Life Study (PALS)</a>   [[14]] <a href=\"/Archive/PRRI.asp\">Public Religion Research Institute (PRRI)</a>   [[15]] <a href=\"/Archive/NatOther.asp\">Other National Surveys</a>   [[16]] <a href=\"/Archive/State1stAmnd.asp\">State of the First Amendment Surveys</a>   [[17]] <a href=\"/Archive/Middletown.asp\">Middletown Data</a>   [[18]] <a href=\"/Archive/Sfocus.asp\">Southern Focus Polls</a>   [[19]] <a href=\"/Archive/RegOther.asp\">Other Local/Regional Surveys</a>   [[20]] <a href=\"/Archive/FCT.asp\">Faith Communities Today</a>   [[21]] <a href=\"/Archive/NCS.asp\">National Congregations Study</a>   [[22]] <a href=\"/Archive/USCLS.asp\">U.S. Congregational Life Survey</a>   [[23]] <a href=\"/Archive/CongOther.asp\">Other Surveys</a>   [[24]] <a href=\"/Archive/Adventist.asp\">Adventist</a>   [[25]] <a href=\"/Archive/Baptist.asp\">Baptist</a>   [[26]] <a href=\"/Archive/Catholic.asp\">Catholic</a>   [[27]] <a href=\"/Archive/Jewish.asp\">Jewish</a>   [[28]] <a href=\"/Archive/Lutheran.asp\">Lutheran</a>   [[29]] <a href=\"/Archive/Methodist.asp\">Methodist</a>   [[30]] <a href=\"/Archive/Mormon.asp\">Mormon</a>   [[31]] <a href=\"/Archive/Nazarene.asp\">Nazarene</a>   [[32]] <a href=\"/Archive/Presbyterian.asp\">Presbyterian</a>   [[33]] <a href=\"/Archive/Unitarian.asp\">Unitarian-Universalist</a>   [[34]] <a href=\"/Archive/GrpOther.asp\">Other Groups</a>   [[35]] <a href=\"/Archive/InstructData.asp\">Instructional Data Files</a>   [[36]] <a href=\"/Archive/Other.asp\">Other Data</a>  "

我想提取標簽中每個網址的列表。 也就是說,使用strsplit之后,列表中的第一個對象應該是“ /Archive/CrossNational.asp”

這將使用strsplittxt strsplit ,盡管這並不是每個人都可以選擇的功能。 這段代碼在分解href-preamble和結束標記后收集了偶數編號的項目。 “ split”參數是一個由兩部分組成的OR-ed組合。 有關R正?regex的更多詳細信息,請參見?regex

 strsplit(txt, "\\]\\] <a href\\=\\\"|\\\">")[[1]][c(FALSE,TRUE)]
#--- result ----

 [1] "/Archive/CrossNational.asp" "/Archive/MultiNation.asp"  
 [3] "/Archive/IntSurveys.asp"    "/Archive/ChCounty.asp"     
 [5] "/Archive/ChState.asp"       "/Archive/NatBaylor.asp"    
 [7] "/Archive/GSS.asp"           "/Archive/Polls.asp"        
 [9] "/Archive/NES.asp"           "/Archive/NatFamily.asp"    
[11] "/Archive/NSYR.asp"          "/Archive/PewResearch.asp"  
[13] "/Archive/PALS.asp"          "/Archive/PRRI.asp"         
[15] "/Archive/NatOther.asp"      "/Archive/State1stAmnd.asp" 
[17] "/Archive/Middletown.asp"    "/Archive/Sfocus.asp"       
[19] "/Archive/RegOther.asp"      "/Archive/FCT.asp"          
[21] "/Archive/NCS.asp"           "/Archive/USCLS.asp"        
[23] "/Archive/CongOther.asp"     "/Archive/Adventist.asp"    
[25] "/Archive/Baptist.asp"       "/Archive/Catholic.asp"     
[27] "/Archive/Jewish.asp"        "/Archive/Lutheran.asp"     
[29] "/Archive/Methodist.asp"     "/Archive/Mormon.asp"       
[31] "/Archive/Nazarene.asp"      "/Archive/Presbyterian.asp" 
[33] "/Archive/Unitarian.asp"     "/Archive/GrpOther.asp"     
[35] "/Archive/InstructData.asp"  "/Archive/Other.asp"   

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM