简体   繁体   English

R使用getURL数据到数据框

[英]R using getURL data to dataframe

I'm downloading data from the web but then don't know how to change it to a dataframe or anything useful. 我正在从Web下载数据,但是不知道如何将其更改为数据框或任何有用的东西。 Does anyone have any suggestions? 有没有人有什么建议? Here is the code: 这是代码:

library(RCurl) 
myfile = getURL(http://www.stat.ufl.edu/~winner/data/lister_ul.dat,
ssl.verifyhost=FALSE, ssl.verifypeer=FALSE)

If I use this: 如果我使用这个:

A = read.csv(textConnection(myfile), header = F)

then R understands this: 然后R明白这一点:

c("1 1 1")

as the first row and not this: 作为第一行,而不是:

c(1, 1, 1).

This doesn't work b/c I need to use 这不起作用b / c我需要使用

colnames(A) = c("col1", "col2", "col3")

and can't find a workaround that doesn't involve some tedious work using 并且找不到不涉及使用以下乏味工作的变通办法

unlist(strsplit(A))

Ughh!! gh!

Any suggestions would be appreciated. 任何建议,将不胜感激。 Or maybe I'll write my own tedious function, if necessary. 或者,如有必要,我可能会编写自己的乏味函数。

gwynn 格温

这有帮助吗?

df <- read.table('http://www.stat.ufl.edu/~winner/data/lister_ul.dat')

You are close. 你近了 Since I don't have RCurl installed but I do have httr (which uses curl ), I'll start with that. 由于我没有安装RCurl但是我确实有httr (使用curl ),因此我将httr开始。 It's a moot problem, though, since I get to the same table-looking content as you. 但是,这是一个有争议的问题,因为我得到的内容与您相同。

Also, @udden2903's answer is more straight-forward, I'm making an assumption that this is a simplified problem, and that you may have need to continue using an alternative fetching method that read.table(URL) does not allow. 另外,@ udden2903的答案更加简单明了,我假设这是一个简化的问题,并且您可能需要继续使用read.table(URL)不允许的替代获取方法。 (To continue using httr and support some other things such as authentication, read its documentation.) (要继续使用httr并支持其他一些功能,例如身份验证,请阅读其文档。)

library(httr)
myfile = GET("http://www.stat.ufl.edu/~winner/data/lister_ul.dat")
str(content(myfile))
# No encoding supplied: defaulting to UTF-8.
#  chr "1 1  1\n1 0 11\n0 1  6\n0 0  6\n"

So, content(myfile) is now what your myfile is. 因此, content(myfile)现在就是您的myfile The first trick is that your data is not comma-delimited ("csv"), so using read.table is necessary. 第一个技巧是您的数据不是逗号分隔的(“ csv”),因此必须使用read.table Second, you nede to specifiy that the first line is not headers. 其次,您需要指定第一行不是标题。

x <- read.table(textConnection(content(myfile, encoding = "UTF-8")), header = FALSE)
x
#   V1 V2 V3
# 1  1  1  1
# 2  1  0 11
# 3  0  1  6
# 4  0  0  6

Now just assign your headers. 现在,只需分配标题即可。

colnames(x) <- c("col1", "col2", "col3")
x
#   col1 col2 col3
# 1    1    1    1
# 2    1    0   11
# 3    0    1    6
# 4    0    0    6

Using only base package functions: 仅使用基本软件包功能:

as.data.frame(
    do.call("rbind", strsplit(
        readLines("http://www.stat.ufl.edu/~winner/data/lister_ul.dat"),
        "\\s+"))
)

  V1 V2 V3
1  1  1  1
2  1  0 11
3  0  1  6
4  0  0  6

What we did was read the raw lines from the webpage, then split each line by the spaces between the characters returned, then created a matrix by calling rbind on each row... which we then translated into a data frame. 我们要做的是从网页中读取原始行,然后用返回的字符之间的空格分隔每一行,然后通过在每一行上调用rbind创建一个矩阵...然后将其转换为数据框。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM