如何在R中覆蓋html文件

Question

我正在嘗試將HTML文件中的電子郵件地址替換為ANTI SPAM格式，然后再次將其導出為nospam.html文件。 我嘗試使用gsub（）函數執行此操作，但是它似乎不起作用。 有什么問題？ 謝謝！！！

datei <- scan("https://isor.univie.ac.at/about-us/People.html", sep = "\n", what= "character")
#pattern.email <- "[a-z]+[.]+[a-z]+?[@]+[a-z]+"
reg.email <- "\\<[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,}\\>" #works

stelle.email <-gregexpr(reg.email, datei, ignore.case = TRUE) #works

unlist(stelle.email)
res.email<- regmatches(datei, stelle.email)

datei2<-gsub(reg.email, "vornameDOTnameNO-SPAMunivieDOTacDOTat", x = datei)

write(datei2, file = "nospam.html")

Answer 1

知道regmatches （用於提取匹配的子字符串）還具有伴隨的regmatches<-函數（用於替換匹配的子字符串），可能會regmatches 。 請參閱?regmatches 。

因此，不需要gsub ，只需：

datei <- scan("https://isor.univie.ac.at/about-us/People.html", sep = "\n", what= "character")
# Read 481 items
reg.email <- "\\<[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,}\\>" #works
stelle.email <- gregexpr(reg.email, datei, ignore.case = TRUE) #works

# for proof, first look at a substring with a "known" email:
substr(datei[268], 236, 281)

### the only new/different line of code, remove your gsub
regmatches(datei, stelle.email) <- "vornameDOTnameNO-SPAMunivieDOTacDOTat"

# now look at the same portion of that one substring, now updated
substr(datei[268], 236, 281)

write(...)

如何在R中覆蓋html文件

問題描述

1 個解決方案

解決方案1
0 2019-11-28 05:15:23

如何在R中覆蓋html文件

問題描述

1 個解決方案

解決方案1 0 2019-11-28 05:15:23

解決方案1
0 2019-11-28 05:15:23