簡體   English   中英

如何在R中覆蓋html文件

[英]how to overwrite a html file in R

我正在嘗試將HTML文件中的電子郵件地址替換為ANTI SPAM格式,然后再次將其導出為nospam.html文件。 我嘗試使用gsub()函數執行此操作,但是它似乎不起作用。 有什么問題? 謝謝!!!

datei <- scan("https://isor.univie.ac.at/about-us/People.html", sep = "\n", what= "character")
#pattern.email <- "[a-z]+[.]+[a-z]+?[@]+[a-z]+"
reg.email <- "\\<[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,}\\>" #works

stelle.email <-gregexpr(reg.email, datei, ignore.case = TRUE) #works

unlist(stelle.email)
res.email<- regmatches(datei, stelle.email)

datei2<-gsub(reg.email, "vornameDOTnameNO-SPAMunivieDOTacDOTat", x = datei)

write(datei2, file = "nospam.html")

知道regmatches (用於提取匹配的子字符串)還具有伴隨的regmatches<-函數(用於替換匹配的子字符串),可能會regmatches 請參閱?regmatches

因此,不需要gsub ,只需:

datei <- scan("https://isor.univie.ac.at/about-us/People.html", sep = "\n", what= "character")
# Read 481 items
reg.email <- "\\<[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,}\\>" #works
stelle.email <- gregexpr(reg.email, datei, ignore.case = TRUE) #works

# for proof, first look at a substring with a "known" email:
substr(datei[268], 236, 281)

### the only new/different line of code, remove your gsub
regmatches(datei, stelle.email) <- "vornameDOTnameNO-SPAMunivieDOTacDOTat"

# now look at the same portion of that one substring, now updated
substr(datei[268], 236, 281)

write(...)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM