How do I import data from an uneven table?
Importing data from a URL is fairly straight forward but what if the data in the URL isn't of a sensible format?
I want the table at the bottom of this data set,
Sample: alpha-pinene in CDCl3, 13C-NMR
# file names in/out: kurs.002,
# spectrometer frequency = 62.895952 MHz
# size = 16384
# sw = 317.985 ppm, sw_h = 20000.00 Hz
# fa = 17047.578 Hz, df = -1.221 Hz
# ymax = 2448625, ymin = -85195
# no. of peaks: 13
#point pos[ppm] pos[Hz] intens. width
6520 144.5020 9088.59 24.67 2.01
7985 116.0689 7300.26 60.98 2.68
9972 77.5046 4874.73 27.53 3.14 * solvent
9998 77.0000 4842.99 27.51 3.15 * solvent
10024 76.4954 4811.25 26.31 3.32 * solvent
11534 47.1889 2967.99 59.17 2.45
11860 40.8617 2570.04 69.15 2.51
12007 38.0087 2390.60 15.30 2.86
12343 31.4875 1980.44 95.20 2.34
12352 31.3129 1969.45 100.00 1.93
12605 26.4026 1660.61 94.80 2.15
12784 22.9285 1442.11 74.33 2.85
12893 20.8130 1309.05 92.16 2.21
Which is from this url, http://www.chemie.fu-berlin.de/chemistry/oc/terpene/gif/a-pinen_c.txt
I tried to use the following code,
peak.exp <- read.csv(url("http://www.chemie.fu-berlin.de/chemistry/oc/terpene/gif/a-pinen_c.txt"),
skip=9, stringsAsFactors=FALSE)
But this returned a dataframe of 13 observations and 1 variable. I wanted a dataframe with 13 observations and six variables (or five variables if it is possible to ignore the 'solvent' labels).
That data is in fixed-width format and you'll need to use read.fwf
to parse it correctly by supplying the widths of the columns in a vector (eg c(6, 9, 9, 8, 7, 10)
as done below). You'll also need to skip some lines in that file to get to the data:
dat <- read.fwf("http://www.chemie.fu-berlin.de/chemistry/oc/terpene/gif/a-pinen_c.txt",
c(6, 9, 9, 8, 7, 10), header=FALSE, skip=10)
head(dat)
## V1 V2 V3 V4 V5 V6
## 1 6520 144.5020 9088.59 24.67 2.01
## 2 7985 116.0689 7300.26 60.98 2.68
## 3 9972 77.5046 4874.73 27.53 3.14 * solvent
## 4 9998 77.0000 4842.99 27.51 3.15 * solvent
## 5 10024 76.4954 4811.25 26.31 3.32 * solvent
## 6 11534 47.1889 2967.99 59.17 2.45
You'll also need to change the column names (if that matters to you), and you can get rid of the "solvent" column ( V6
) by changing the vector of widths to c(6, 9, 9, 8, 7)
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.