![](/img/trans.png)
[英]readHTMLTable doesnt always read all sheets from google docs spreadsheets
[英]read data with decimal comma with readHTMLTable
我以HTML表格的形式從數據庫中進行了轉儲。 我的問題是它使用逗號作為十進制字符,而我無法獲取readHTMLTable來正確處理它。 該值最終作為因子而不是數字。 這可以從外部解決,但我想在R中全部完成。
我試圖將dec=","
傳遞出去,希望省略號將其沿執行管道傳遞出去,但是它不起作用。
我嘗試使用elFun
獲得的readHTMLTable的幫助啟發了下一個elFun
library(XML)
tryAsNumeric <- function(node) {
val = xmlValue(node)
ans = as.numeric(gsub(",", ".", val))
if(is.numeric(ans))
ans
else
val
}
tmp_list <- readHTMLTable("teeChart.xls", elFun = tryAsNumeric)
並最終得到此消息
There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In (function (node) ... : NAs introduced by coercion
2: In (function (node) ... : NAs introduced by coercion
3: In (function (node) ... : NAs introduced by coercion
4: In (function (node) ... : NAs introduced by coercion
為簡潔起見,截斷了清單。
這是可簡化的表格。 (teeChart.xls)
<table border="1">
<tr><td></td><td>Lägenhet 053</td><td></td><td>Lägenhet 054</td><td></td><td>Lägenhet 055</td><td></td></tr>
<tr><td>Index</td><td>X</td><td>Y</td><td>X</td><td>Y</td><td>X</td><td>Y</td></tr>
<tr><td>0</td><td>42309</td><td>20,8249988555908</td><td>42309</td><td>20,2000007629395</td><td>42309</td><td>22,2000007629395</td></tr>
<tr><td>1</td><td>42309,0416666667</td><td>20,7000007629395</td><td>42309,0416666667</td><td>20,2000007629395</td><td>42309,0416666667</td><td>22,125</td></tr>
<tr><td>2</td><td>42309,0833333333</td><td>20,6000003814697</td><td>42309,0833333333</td><td>20,2000007629395</td><td>42309,0833333333</td><td>22,0249996185303</td></tr>
</table>
設置colClasses
? 同樣從幫助中?readHTMLTable
:
library(XML)
tryAsNumeric <- function(node) {
val = xmlValue(node)
ans = as.numeric(gsub(",", ".", val))
if(all(is.numeric(ans)))
ans
else
val
}
txt <- readLines(n=7)
<table border="1">
<tr><td></td><td>Lägenhet 053</td><td></td><td>Lägenhet 054</td><td></td><td>Lägenhet 055</td><td></td></tr>
<tr><td>Index</td><td>X</td><td>Y</td><td>X</td><td>Y</td><td>X</td><td>Y</td></tr>
<tr><td>0</td><td>42309</td><td>20,8249988555908</td><td>42309</td><td>20,2000007629395</td><td>42309</td><td>22,2000007629395</td></tr>
<tr><td>1</td><td>42309,0416666667</td><td>20,7000007629395</td><td>42309,0416666667</td><td>20,2000007629395</td><td>42309,0416666667</td><td>22,125</td></tr>
<tr><td>2</td><td>42309,0833333333</td><td>20,6000003814697</td><td>42309,0833333333</td><td>20,2000007629395</td><td>42309,0833333333</td><td>22,0249996185303</td></tr>
</table>
doc <- htmlParse(txt, asText=TRUE)
( res <- readHTMLTable(doc, elFun = tryAsNumeric, colClasses = rep("numeric", 7)) )
# $`NULL`
# NA NA NA NA NA NA NA
# 1 NA NA NA NA NA NA NA
# 2 0 42309.00 20.825 42309.00 20.2 42309.00 22.200
# 3 1 42309.04 20.700 42309.04 20.2 42309.04 22.125
# 4 2 42309.08 20.600 42309.08 20.2 42309.08 22.025
str(res)
# List of 1
# $ NULL:'data.frame': 4 obs. of 7 variables:
# ..$ NA: num [1:4] NA 0 1 2
# ..$ NA: num [1:4] NA 42309 42309 42309
# ..$ NA: num [1:4] NA 20.8 20.7 20.6
# ..$ NA: num [1:4] NA 42309 42309 42309
# ..$ NA: num [1:4] NA 20.2 20.2 20.2
# ..$ NA: num [1:4] NA 42309 42309 42309
# ..$ NA: num [1:4] NA 22.2 22.1 22
library(XML)
txt <- readLines(n=7)
<table border="1">
<tr><td></td><td>Lägenhet 053</td><td></td><td>Lägenhet 054</td><td></td><td>Lägenhet 055</td><td></td></tr>
<tr><td>Index</td><td>X</td><td>Y</td><td>X</td><td>Y</td><td>X</td><td>Y</td></tr>
<tr><td>0</td><td>42309</td><td>20,8249988555908</td><td>42309</td><td>20,2000007629395</td><td>42309</td><td>22,2000007629395</td></tr>
<tr><td>1</td><td>42309,0416666667</td><td>20,7000007629395</td><td>42309,0416666667</td><td>20,2000007629395</td><td>42309,0416666667</td><td>22,125</td></tr>
<tr><td>2</td><td>42309,0833333333</td><td>20,6000003814697</td><td>42309,0833333333</td><td>20,2000007629395</td><td>42309,0833333333</td><td>22,0249996185303</td></tr>
</table>
doc <- htmlParse(txt)
m <- as.matrix(readHTMLTable(doc, which=1))
colnames(m) <- m[1,]
m <- m[-1, ]
m <- gsub(",", ".", m)
as.data.frame(structure(as.numeric(m), .Dim=dim(m), .Dimnames = dimnames(m)))
# Index X Y X Y X Y
# 1 0 42309.00 20.825 42309.00 20.2 42309.00 22.200
# 2 1 42309.04 20.700 42309.04 20.2 42309.04 22.125
# 3 2 42309.08 20.600 42309.08 20.2 42309.08 22.025
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.