简体   繁体   中英

Importing data from URL using read.csv

How do I import data from an uneven table?

Importing data from a URL is fairly straight forward but what if the data in the URL isn't of a sensible format?

I want the table at the bottom of this data set,

Sample: alpha-pinene in CDCl3, 13C-NMR

# file names in/out: kurs.002, 
# spectrometer frequency = 62.895952 MHz
# size = 16384
# sw = 317.985 ppm, sw_h = 20000.00 Hz
# fa = 17047.578 Hz, df = -1.221 Hz
# ymax = 2448625, ymin = -85195
# no. of peaks: 13
#point  pos[ppm] pos[Hz]  intens. width  
  6520 144.5020  9088.59   24.67   2.01 
  7985 116.0689  7300.26   60.98   2.68 
  9972  77.5046  4874.73   27.53   3.14 * solvent
  9998  77.0000  4842.99   27.51   3.15 * solvent
 10024  76.4954  4811.25   26.31   3.32 * solvent
 11534  47.1889  2967.99   59.17   2.45 
 11860  40.8617  2570.04   69.15   2.51 
 12007  38.0087  2390.60   15.30   2.86 
 12343  31.4875  1980.44   95.20   2.34 
 12352  31.3129  1969.45  100.00   1.93 
 12605  26.4026  1660.61   94.80   2.15 
 12784  22.9285  1442.11   74.33   2.85 
 12893  20.8130  1309.05   92.16   2.21 

Which is from this url, http://www.chemie.fu-berlin.de/chemistry/oc/terpene/gif/a-pinen_c.txt

I tried to use the following code,

peak.exp <- read.csv(url("http://www.chemie.fu-berlin.de/chemistry/oc/terpene/gif/a-pinen_c.txt"),
skip=9, stringsAsFactors=FALSE)

But this returned a dataframe of 13 observations and 1 variable. I wanted a dataframe with 13 observations and six variables (or five variables if it is possible to ignore the 'solvent' labels).

That data is in fixed-width format and you'll need to use read.fwf to parse it correctly by supplying the widths of the columns in a vector (eg c(6, 9, 9, 8, 7, 10) as done below). You'll also need to skip some lines in that file to get to the data:

dat <- read.fwf("http://www.chemie.fu-berlin.de/chemistry/oc/terpene/gif/a-pinen_c.txt",
                c(6, 9, 9, 8, 7, 10), header=FALSE, skip=10)

head(dat)

##      V1       V2      V3    V4   V5         V6
## 1  6520 144.5020 9088.59 24.67 2.01           
## 2  7985 116.0689 7300.26 60.98 2.68           
## 3  9972  77.5046 4874.73 27.53 3.14  * solvent
## 4  9998  77.0000 4842.99 27.51 3.15  * solvent
## 5 10024  76.4954 4811.25 26.31 3.32  * solvent
## 6 11534  47.1889 2967.99 59.17 2.45           

You'll also need to change the column names (if that matters to you), and you can get rid of the "solvent" column ( V6 ) by changing the vector of widths to c(6, 9, 9, 8, 7) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM