简体   繁体   中英

read.table with variable columns

I want to read in a table from a NOAA file hosted online. The file is a list of stations in various cities. My trouble is having a way of reading the data in. The columns do not seem to be separated consistently. This means I have to turn the fill option to true which ends up with multiple word cities ending up in different columns. This is clearly not what I want but I cannot see a solution that can correct for it. Is there any way to specify maybe the last few columns all to be read in together as one column? Or perhaps I shouldn't use read.table and perhaps something else altogether? Any help is appreciated!

The code is below.

url <- "ftp://ftp.ncdc.noaa.gov/pub/data/normals/1981-2010/station-inventories/temp-inventory.txt"

stations <- read.table(url, header=FALSE, skip=2, fill=TRUE, nrows = 5,
             col.names = c("ID","lat","lon","UNK","State","City","UNK2","UNK3","UNK4")
             )
stations

           ID     lat      lon   UNK State     City        UNK2        UNK3         UNK4
1 CQC00914080 15.2136 145.7497 252.1    MP  CAPITOL        HILL           1  TRADITIONAL
2 CQC00914801 14.1717 145.2428 179.2    MP     ROTA          AP       91221  TRADITIONAL
3 FMC00914395  5.3544 162.9533   2.1    FM   KOSRAE       91355 TRADITIONAL
4 FMC00914419  5.5167 153.8167   1.5    FM LUKUNOCH TRADITIONAL            
5 FMC00914446  9.6053 138.1786  14.9    FM     MAAP TRADITIONAL            

The original source with the relevant lines is this:

CQC00914080  15.2136  145.7497  252.1 MP CAPITOL HILL 1                               TRADITIONAL  
CQC00914801  14.1717  145.2428  179.2 MP ROTA AP                                91221 TRADITIONAL  
FMC00914395   5.3544  162.9533    2.1 FM KOSRAE                                 91355 TRADITIONAL  
FMC00914419   5.5167  153.8167    1.5 FM LUKUNOCH                                     TRADITIONAL  
FMC00914446   9.6053  138.1786   14.9 FM MAAP                                         TRADITIONAL  

Looks like a fixed-width file, which can be appropriately processed using ?read.fwf . Here's the full line that seems to work to import the file:

read.fwf(url, widths=c(11,9,10,7,4,31,3,10,13), strip.white=TRUE, comment.char="")

The comment.char="" is necessary because there are # characters within the text file, which are interpreted as comment characters by R. This makes certain lines throw an error as it doesn't find all the columns it needs to.

It works fine with read_table from package readr :

readr::read_table(url, skip=2,  n_max = 5,col_names=FALSE)

cols(
  X1 = col_character(),
  X2 = col_double(),
  X3 = col_double(),
  X4 = col_double(),
  X5 = col_character(),
  X6 = col_character(),
  X7 = col_character(),
  X8 = col_character(),
  X9 = col_integer(),
  X10 = col_character()
)
# A tibble: 5 × 10
           X1      X2       X3    X4    X5             X6    X7    X8    X9         X10
        <chr>   <dbl>    <dbl> <dbl> <chr>          <chr> <chr> <chr> <int>       <chr>
1 CQC00914080 15.2136 145.7497 252.1    MP CAPITOL HILL 1                NA TRADITIONAL
2 CQC00914801 14.1717 145.2428 179.2    MP        ROTA AP             91221 TRADITIONAL
3 FMC00914395  5.3544 162.9533   2.1    FM         KOSRAE             91355 TRADITIONAL
4 FMC00914419  5.5167 153.8167   1.5    FM       LUKUNOCH                NA TRADITIONAL
5 FMC00914446  9.6053 138.1786  14.9    FM           MAAP                NA TRADITIONAL

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM