[英]R: how to save this special file as csv file?
我的輸入文件從普通的csv表開始。
x <- read.table(textConnection(
+ ' models cores time
+ aa c1 xxx|yyy
+ aa c2 xxx|zzz
+ aa c3 www
+ aa c4 xxx|vvv
+ bb c1 vvv|www
+ bb c2 www|qqq
+ bb c3 xxx|uuu
+ bb c4 uuu' ), header=TRUE)
這是一個具有因子作為所有條目的文件,如下所示:
> str(x)
'data.frame': 8 obs. of 3 variables:
$ models: Factor w/ 2 levels "aa","bb": 1 1 1 1 2 2 2 2
$ cores : Factor w/ 4 levels "c1","c2","c3",..: 1 2 3 4 1 2 3 4
$ time : Factor w/ 8 levels "uuu","vvv|www",..: 7 8 3 6 2 4 5 1
為了使用命令“ strsplit”分割最后一列,我參考發布的先前問題完成了以下步驟。
> write.csv(x, file="x.csv")
> y <- read.csv(file="x.csv",header=TRUE,stringsAsFactors=FALSE)
> str(y)
'data.frame': 8 obs. of 4 variables:
$ X : int 1 2 3 4 5 6 7 8
$ models: chr "aa" "aa" "aa" "aa" ...
$ cores : chr "c1" "c2" "c3" "c4" ...
$ time : chr "xxx|yyy" "xxx|zzz" "www" "xxx|vvv" ...
Warning messages:
1: closing unused connection 4 (" models cores time \naa c1 xxx|yyy \naa c2 xxx|zzz \naa c3 www \naa c4 xxx|vvv \nbb c1 vvv|www \nbb c2 www|qqq \nbb c3 xxx|uuu \nbb c4 uuu")
2: closing unused connection 3 (" models cores time \n4 1 0.000365 \n4 2 0.000259 \n4 3 0.000239 \n4 4 0.000220 \n8 1 0.000259 \n8 2 0.000249 \n8 3 0.000251 \n8 4 0.000258")
> df2 <- as.data.frame(
+ t(
+ do.call(cbind,
+ lapply(1:nrow(y),function(x){
+ sapply(unlist(strsplit(y[x,4],"\\|")),c,y[x,2:3],USE.NAMES=FALSE)
+ }) ) ) )
> str(df2)
結果就是我需要的。
> df2
V1 models cores
1 xxx aa c1
2 yyy aa c1
3 xxx aa c2
4 zzz aa c2
5 www aa c3
6 xxx aa c4
7 vvv aa c4
8 vvv bb c1
9 www bb c1
10 www bb c2
11 qqq bb c2
12 xxx bb c3
13 uuu bb c3
14 uuu bb c4
當我輸入str(df2)時,我發現所有條目都是chr的列表:
'data.frame': 14 obs. of 3 variables:
$ V1 :List of 14
..$ : chr "xxx"...
$ models:List of 14
..$ : chr "aa"
..$ : chr "aa"
$ models:List of 14
..$ : chr "aa"
..$ : chr "aa"
但是,我很難再次將此最終結果另存為csv表。
> write.csv(df2, file="df2.csv")
Error in write.table(x, file, nrow(x), p, rnames, sep, eol, na, dec, as.integer(quote), :
unimplemented type 'list' in 'EncodeElement'
如何再次以CSV格式保存df2文件? 請幫助。
您正在做的事情看起來似乎很愚蠢-為什么將內容寫到CSV以便再次讀回? -但由於df2
大致是您想要的方式,因此您需要unlist()
df2
的三個組件並將其轉換為數據幀。
out <- data.frame(lapply(df2, function(x) factor(unlist(x))))
這給了我們:
> out
V1 models cores
1 xxx aa c1
2 yyy aa c1
3 xxx aa c2
4 zzz aa c2
5 www aa c3
6 xxx aa c4
7 vvv aa c4
8 vvv bb c1
9 www bb c1
10 www bb c2
11 qqq bb c2
12 xxx bb c3
13 uuu bb c3
14 uuu bb c4
> str(out)
'data.frame': 14 obs. of 3 variables:
$ V1 : Factor w/ 7 levels "qqq","uuu","vvv",..: 5 6 5 7 4 5 3 3 4 4 ...
$ models: Factor w/ 2 levels "aa","bb": 1 1 1 1 1 1 1 2 2 2 ...
$ cores : Factor w/ 4 levels "c1","c2","c3",..: 1 1 2 2 3 4 4 1 1 2 ...
可以反復讀出:
> write.csv(out, file="out.csv", row.names = FALSE)
> read.csv("out.csv")
V1 models cores
1 xxx aa c1
2 yyy aa c1
3 xxx aa c2
4 zzz aa c2
5 www aa c3
6 xxx aa c4
7 vvv aa c4
8 vvv bb c1
9 www bb c1
10 www bb c2
11 qqq bb c2
12 xxx bb c3
13 uuu bb c3
14 uuu bb c4
更新:從x
直接轉到所需的輸出,而不是將其讀出為CSV並再次返回然后處理y
會更簡單。 例如這正好從x
直接到相同的結果out
從上方:
V1 <- with(x, strsplit(as.character(time), "\\|"))
lens <- lapply(V1, length)
out2 <- data.frame(V1 = factor(unlist(V1)),
models = with(x, rep(models, times = lens)),
cores = with(x, rep(cores, times = lens)))
這使:
> out2
V1 models cores
1 xxx aa c1
2 yyy aa c1
3 xxx aa c2
4 zzz aa c2
5 www aa c3
6 xxx aa c4
7 vvv aa c4
8 vvv bb c1
9 www bb c1
10 www bb c2
11 qqq bb c2
12 xxx bb c3
13 uuu bb c3
14 uuu bb c4
> str(out2)
'data.frame': 14 obs. of 3 variables:
$ V1 : Factor w/ 7 levels "qqq","uuu","vvv",..: 5 6 5 7 4 5 3 3 4 4 ...
$ models: Factor w/ 2 levels "aa","bb": 1 1 1 1 1 1 1 2 2 2 ...
$ cores : Factor w/ 4 levels "c1","c2","c3",..: 1 1 2 2 3 4 4 1 1 2 ...
> all.equal(out, out2)
[1] TRUE
順便說一句:順便說一句,當您從R Console復制代碼時,我們很難將其粘貼到代碼中,因此它包含提示( +
)。 相反,您可以完成dput(x)
並將其粘貼到您的Q中:
structure(list(models = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L), .Label = c("aa", "bb"), class = "factor"), cores = structure(c(1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("c1", "c2", "c3", "c4"
), class = "factor"), time = structure(c(7L, 8L, 3L, 6L, 2L,
4L, 5L, 1L), .Label = c("uuu", "vvv|www", "www", "www|qqq", "xxx|uuu",
"xxx|vvv", "xxx|yyy", "xxx|zzz"), class = "factor")), .Names = c("models",
"cores", "time"), class = "data.frame", row.names = c(NA, -8L
))
那么我們都可以簡單地做到:
x <- structure(list(models = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L), .Label = c("aa", "bb"), class = "factor"), cores = structure(c(1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("c1", "c2", "c3", "c4"
), class = "factor"), time = structure(c(7L, 8L, 3L, 6L, 2L,
4L, 5L, 1L), .Label = c("uuu", "vvv|www", "www", "www|qqq", "xxx|uuu",
"xxx|vvv", "xxx|yyy", "xxx|zzz"), class = "factor")), .Names = c("models",
"cores", "time"), class = "data.frame", row.names = c(NA, -8L
))
與創建df2
的調用相同。 這將是更可取的:
write.csv(x, file="x.csv")
y <- read.csv(file="x.csv", header=TRUE, stringsAsFactors=FALSE)
df2 <- data.frame(
t(do.call(cbind,
lapply(1:nrow(y),function(x){
sapply(unlist(strsplit(y[x,4],"\\|")),c,y[x,2:3],
USE.NAMES=FALSE)
}))))
這樣,對我們來說,重建您擁有的對象和嘗試的對象很簡單。
fun_transform <- function(.x){
time_split <- strsplit(.x$time,split="\\|")
n_rec <- sapply(time_split,length)
ind <- rep(seq(nrow(.x)),n_rec)
cbind(.x[ind,1:2],time=unlist(time_split,use.names=FALSE))
}
df2 <- fun_transform(y)
編輯-示例數據
txt <- textConnection(
' models cores time
aa c1 xxx|yyy
aa c2 xxx|zzz
aa c3 www
aa c4 xxx|vvv
bb c1 vvv|www
bb c2 www|qqq
bb c3 xxx|uuu
bb c4 uuu' )
y <- read.table(txt, header=TRUE,as.is=TRUE)
close(txt)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.