[英]R: How to uppercase first letter of each word split by semicolon in data frame column?
[英]How to split columns in data frame by semicolon in R
我的問題對我來說很明顯,但是我找不到解決方案。
一個像這樣的數據框:
<TICKER>;<PER>;<DATE>;<TIME>;<OPEN>;<HIGH>;<LOW>;<CLOSE>
USD Index;D;20150801;000000;97.199;97.336;97.191;97.192
USD Index;D;20150802;000000;97.226;97.294;97.207;97.257
USD Index;D;20150803;000000;97.255;97.582;97.155;97.499
我需要將它們分成不同的列; 像這樣:
<TICKER> <PER> <DATE> <TIME> <OPEN> <HIGH> <LOW> <CLOSE>
USD Index D 20150801 0 97.199 97.336 97.191 97.192
USD Index D 20150802 0 97.226 97.294 97.207 97.257
USD Index D 20150803 0 97.255 97.582 97.155 97.499
這是一個基本問題,需要放在搜索結果的頂部。 預先感謝您幫助我解決此問題!
我們可以使用read.table
setNames(read.table(text=dat[,1], sep=";", stringsAsFactors=FALSE),
scan(text=names(dat), sep=";", what = "", quiet = TRUE))
# <TICKER> <PER> <DATE> <TIME> <OPEN> <HIGH> <LOW> <CLOSE>
# 1 USD Index D 20150801 0 97.199 97.336 97.191 97.192
# 2 USD Index D 20150802 0 97.226 97.294 97.207 97.257
# 3 USD Index D 20150803 0 97.255 97.582 97.155 97.499
dat <- structure(list(`<TICKER>;<PER>;<DATE>;<TIME>;<OPEN>;<HIGH>;<LOW>;<CLOSE>` =
c("USD Index;D;20150801;000000;97.199;97.336;97.191;97.192",
"USD Index;D;20150802;000000;97.226;97.294;97.207;97.257",
"USD Index;D;20150803;000000;97.255;97.582;97.155;97.499"
)), .Names = "<TICKER>;<PER>;<DATE>;<TIME>;<OPEN>;<HIGH>;<LOW>;<CLOSE>",
class = "data.frame", row.names = c(NA, -3L))
使用fread()
非常簡單。 使用akrun的dat
,我們有
data.table::fread(paste(c(names(dat), dat[[1]]), collapse = "\n"))
# <TICKER> <PER> <DATE> <TIME> <OPEN> <HIGH> <LOW> <CLOSE>
# 1: USD Index D 20150801 0 97.199 97.336 97.191 97.192
# 2: USD Index D 20150802 0 97.226 97.294 97.207 97.257
# 3: USD Index D 20150803 0 97.255 97.582 97.155 97.499
對於數據幀結果,只需在fread()
調用中添加data.table = FALSE
即可。
另外,可以使用tstrsplit()
拆分為列,並使用setnames()
重命名列:
library(data.table)
setDT(dat)[, tstrsplit(.SD[[1]], ";")][, setnames(.SD, strsplit(names(dat), ";")[[1]])]
<TICKER> <PER> <DATE> <TIME> <OPEN> <HIGH> <LOW> <CLOSE> 1: USD Index D 20150801 000000 97.199 97.336 97.191 97.192 2: USD Index D 20150802 000000 97.226 97.294 97.207 97.257 3: USD Index D 20150803 000000 97.255 97.582 97.155 97.499
請注意, <TICKER>
等不是語法上有效的列名,需要在許多地方進行轉義。 因此,我建議擺脫像這樣的尖括號:
setDT(dat)[, tstrsplit(.SD[[1]], ";")][
, setnames(.SD, gsub("[<>]", "", strsplit(names(dat), ";")[[1]]))]
TICKER PER DATE TIME OPEN HIGH LOW CLOSE 1: USD Index D 20150801 000000 97.199 97.336 97.191 97.192 2: USD Index D 20150802 000000 97.226 97.294 97.207 97.257 3: USD Index D 20150803 000000 97.255 97.582 97.155 97.499
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.