简体   繁体   中英

How to read an unknown separator csv file in to R

I have an example data that saved as csv file in this websit .

The 1.csv was sent to me by someone else and I can not read it into R correctly using read.csv .

> dat = read.csv('1.csv')
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 1 appears to contain embedded nulls
2: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 2 appears to contain embedded nulls
3: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  incomplete final line found by readTableHeader on 'data/hanze/1.csv'

Then I also tried adding sep in read.csv but also failed.

dat = read.csv('1.csv', sep = ',')
dat = read.csv('1.csv', sep = '\t')

Finally I re-save the 1.csv file using Microsoft Excel as a new csv file with comma separator named 1_test.csv and it works.

dat = read('1_test.csv', encoding = 'UTF-8')
head(dat)

  id           station     lon    lat RASTERVALU
1  1              东四 116.417 39.929  0.2406870
2  2              天坛 116.407 39.886  0.0992821
3  3              官园 116.339 39.929  0.1243020
4  4          万寿西宫 116.352 39.878  0.2394120
5  5          奥体中心 116.397 39.982  0.2368810
6  6 农展<e9><U+00A6>? 116.461 39.937  0.2307600

In my real situation, I have hundreds of file like 1.csv and I do not want to re-save them as a new csv file using Microsoft Excel .

My question is that is there a way that could read the 1.csv straightly and correctly into R without re-save it?

This may introduce unforeseen errors, but it appears to provide the expected output:

library(data.table)
library(tidyverse)
test <- fread(file = "~/Downloads/1.csv")
#> Warning in fread(file = "~/Downloads/1.csv"): Detected 1 column names but the
#> data has 140 columns (i.e. invalid file). Added 139 extra default column names
#> at the end.
test_df <- as.data.frame(matrix(unlist(test, use.names = FALSE), ncol = 4, byrow = TRUE))
test_df %>% 
  separate(V1, c("id", "station"), extra = "merge") %>% 
  mutate(station = gsub(pattern = "0", replacement = "", x = station)) %>% 
  rename("lon" = V2,
         "lat" = V3,
         "RASTERVALU" = V4)
#>    id        station     lon    lat RASTERVALU
#> 1   1           东四 116.417 39.929   0.240687
#> 2   2           天坛 116.407 39.886  0.0992821
#> 3   3           官园 116.339 39.929   0.124302
#> 4   4       万寿西宫 116.352 39.878   0.239412
#> 5   5       奥体中心 116.397 39.982   0.236881
#> 6   6         农展馆 116.461 39.937    0.23076
#> 7   7           万柳 116.287 39.987   0.201353
#> 8   8       北部新区 116.174  40.09   0.170883
#> 9   9         植物园 116.207 40.002   0.210636
#> 10 10       丰台花园 116.279 39.863   0.225224
#> 11 11           云岗 116.146 39.824    0.23084
#> 12 12           古城 116.184 39.914    0.17514
#> 13 13       房山良乡 116.136 39.742   0.243377
#> 14 14     大兴黄村镇 116.404 39.718   0.295714
#> 15 15     亦庄开发区 116.506 39.795   0.315679
#> 16 16       通州新城 116.663 39.886   0.255555
#> 17 17       顺义新城 116.655 40.127   0.212804
#> 18 18         昌平镇  116.23 40.217   0.160067
#> 19 19   门头沟龙泉镇 116.106 39.937    0.17251
#> 20 20         平谷镇   117.1 40.143   0.275457
#> 21 21         怀柔镇 116.628 40.328   0.177003
#> 22 22         密云镇 116.832  40.37   0.253771
#> 23 23         延庆镇 115.972 40.453   0.219738
#> 24 24       昌平定陵  116.22 40.292    0.15908
#> 25 25   京西北八达岭 115.988 40.365      -9999
#> 26 26 京东北密云水库 116.911 40.499   0.173666
#> 27 27     京东东高村  117.12   40.1   0.276452
#> 28 28   京东南永乐店 116.783 39.712   0.278231
#> 29 29       京南榆垡   116.3  39.52   0.533654
#> 30 30   京西南琉璃河     116  39.58   0.449057
#> 31 31     前门东大街 116.395 39.899   0.236876
#> 32 32   永定门内大街 116.394 39.876   0.148231
#> 33 33   西直门北大街 116.349 39.954   0.234347
#> 34 34     南三环西路 116.368 39.856   0.177043
#> 35 35     东四环北路 116.483 39.939   0.253252

Created on 2021-07-26 by the reprex package (v2.0.0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM