如何将.data文件读入R

Question

I have tried to load the data from http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/hungarian.data into R using the following piece of code 我尝试使用以下代码将数据从http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/hungarian.data加载到R中

hData <- read.table(file.choose(), sep = "\t", dec = ",", fileEncoding = "UTF-16")

but its not populating the exact data. 但它没有填充确切的数据。 The data has 76 attributes in it and the details about it are given here: http://archive.ics.uci.edu/ml/datasets/Heart+Disease . 数据具有76个属性，有关详细信息，请参见： http : //archive.ics.uci.edu/ml/datasets/Heart+Disease 。

Can someone tell me what am I doing incorrect? 有人可以告诉我我做错了什么吗？

Answer 1

The file contains extra line breaks that are causing issues. 该文件包含引起问题的多余换行符。 If you chop them out with regex, you can read it in: 如果您使用正则表达式将它们砍掉，则可以阅读以下内容：

# read file into a single string
x <- readr::read_file('http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/hungarian.data')

# or in base, x <- paste(readLines(url('http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/hungarian.data')), collapse = '\n')

# gsub out line breaks that follow numbers (not "name") and read data
df <- read.table(text = gsub('(\\d)\\n', '\\1 ', x))

head(df, 2)
##     V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25
## 1 1254  0 40  1  1  0  0 -9  2 140   0 289  -9  -9  -9   0  -9  -9   0  12  16  84   0   0   0
## 2 1255  0 49  0  1  0  0 -9  3 160   1 180  -9  -9  -9   0  -9  -9   0  11  16  84   0   0   0
##   V26 V27 V28 V29 V30 V31 V32 V33 V34 V35 V36 V37 V38 V39 V40 V41 V42 V43 V44 V45 V46 V47 V48
## 1   0   0 150  18  -9   7 172  86 200 110 140  86   0   0   0  -9  26  20  -9  -9  -9  -9  -9
## 2   0   0  -9  10   9   7 156 100 220 106 160  90   0   0   1   2  14  13  -9  -9  -9  -9  -9
##   V49 V50 V51 V52 V53 V54 V55 V56 V57 V58 V59 V60 V61 V62 V63 V64 V65 V66 V67 V68 V69 V70 V71
## 1  -9  -9  -9  -9  -9  -9  12  20  84   0  -9  -9  -9  -9  -9  -9  -9  -9  -9  -9   1   1   1
## 2  -9  -9  -9  -9  -9  -9  11  20  84   1  -9  -9   2  -9  -9  -9  -9  -9  -9  -9   1   1   1
##   V72 V73 V74 V75  V76
## 1   1   1  -9  -9 name
## 2   1   1  -9  -9 name

If there doesn't happen to be a conveniently different data type at the end, you can use scan to make a vector, then split and reassemble: 如果最后没有碰巧是方便的其他数据类型，则可以使用scan生成向量，然后split并重新组装：

# download data and split into a character vector
x <- scan(url('http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/hungarian.data'), character())

# split and assemble data.frame
df <- data.frame(split(x, 1:76), stringsAsFactors = FALSE)

# fix types
df[] <- lapply(df, type.convert, as.is = TRUE)

or pass scan a model of the types of what a single row should be to read directly into a list: 或通过scan单个行的类型的模型以直接读取到列表中：

x <- scan(url('http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/hungarian.data'), 
          c(replicate(75, numeric()), list(character())))

df <- as.data.frame(x)
names(df) <- paste0('V', 1:76)    # replace ugly names

If getting the type structure correct is too complicated, read everything in as character with replicate(76, character()) and use type.convert like the previous option. 如果要正确设置类型结构太复杂，请使用type.convert replicate(76, character())所有内容读取为字符，并像上一个选项一样使用type.convert 。

Alternately, use readLines , split to create a list with the correct strings for each row grouped, and paste it all back together to use read.table : 或者，使用readLines ， split创建一个列表，该列表具有针对分组的每一行的正确字符串， paste其全部paste回以使用read.table ：

x <- readLines(url('http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/hungarian.data'))

df <- read.table(text = paste(sapply(split(x, 
                                           rep(seq(length(x) / 10), each = 10)), 
                                     paste, collapse = ' '), collapse = '\n'))

如何将.data文件读入R

问题描述

1 个解决方案

解决方案1
3 2016-10-17 02:46:55

如何将.data文件读入R

问题描述

1 个解决方案

解决方案1 3 2016-10-17 02:46:55

解决方案1
3 2016-10-17 02:46:55