简体   繁体   English

如何将多个列转换为R中的各个行

[英]How to convert multiple columns to individual rows in R

I have a data frame in R that has many rows (over 3000) with F0 (fundamental frequency) tracks of an utterance in it. 我在R中有一个数据帧,它有许多行(超过3000),其中有一个话语的F0(基频)轨道。 The rows have the following information in them: speaker ID, group #, repetition #, accent type, sex, and then 50 columns of F0 points. 行中包含以下信息:说话者ID,组#,重复#,重音类型,性别,然后50列F0点。 The data looks like this: 数据如下所示:

Speaker Sex Group Repetition Accent    Word         1         2         3        4
    105   M     1          1      N AILMENT 102.31030 102.31030 102.31030 102.31127 
    105   M     1          1      N COLLEGE 111.80641 111.80313 111.68612 111.36020
    105   M     1          1      N  FATHER 124.06655 124.06655 124.06655 124.06655 

But instead of only going to X4, it has 50 points per row, so I have a 3562x56 data frame. 但是它不是仅仅转到X4,而是每行50个点,所以我有一个3562x56的数据帧。 I want to change it so each column of data in the F0 track (so after word, from 1:50) gets its own column, with the associated column number as another row. 我想改变它,因此F0轨道中的每一列数据(因此,从1:50开始)都有自己的列,相关的列号作为另一行。 I want to keep all of the information in the first six columns with each data point as well, so it would look like this: 我想在每个数据点的前六列中保留所有信息,所以它看起来像这样:

Speaker Sex Group Repetition Accent    Word       Num        F0
    105   M     1          1      N AILMENT         1 102.31030
    105   M     1          1      N AILMENT         2 102.31030
    105   M     1          1      N AILMENT         3 102.31030
    105   M     1          1      N AILMENT         4 102.31127
    ...
    105   M     1          1      N COLLEGE         1 111.80641 
    105   M     1          1      N COLLEGE         1 111.80313 
    105   M     1          1      N COLLEGE         1 111.68612 
    105   M     1          1      N COLLEGE         1 111.36020 
    ...

The code I tried to use, while tedious, is as follows: 我尝试使用的代码虽然单调乏味,但如下所示:

x = 1
for (i in 1:dim(normrangef0)[1]) {
     for (j in 1:50) {
             norm.all$Speaker[x] <- normrangef0$Speaker[i]
             norm.all$Sex[x] <- normrangef0$Sex[i]
             norm.all$Group[x] <- normrangef0$Group[i]
             norm.all$Repetition[x] <- normrangef0$Repetition[i]
             norm.all$Word[x] <- normrangef0$Word[i]
             norm.all$Accent[x] <- normrangef0$Accent[i]
             norm.all$Time[x] <- j
             norm.all$F0[x] <- normrangef0[i,j+6]
             x = x+1    
    }
}

However, when I do this with norm.all as a NULL object (just defined by norm.all = c() ), I end up with a list of over 200k items, many of which are NAs. 但是,当我使用norm.all作为NULL对象(仅由norm.all = c()定义)时,我最终会得到一个包含超过20万个项目的列表,其中很多是NA。 When I define norm.all as a data frame (either an empty one or one of all 0s, in the 178100x8 data frame, I get an error: 当我将norm.all定义为数据帧(空数或全0中的一个,在178100x8数据帧中,我得到一个错误:

Error in $<-.data.frame ( *tmp* , "Speaker", value = 105L) : replacement has 1 row, data has 0 $<-.data.frame*tmp* ,“Speaker”,value = 105L)出错:替换有1行,数据有0

Is my code just totally off? 我的代码完全关闭了吗? Is there another way to do this? 还有另一种方法吗?

Use melt from "reshape2" 使用来自“reshape2”的melt

library(reshape2)
melt(mydf, id.vars=c("Speaker", "Sex", "Group", "Repetition", "Accent", "Word"))
#    Speaker Sex Group Repetition Accent    Word variable    value
# 1      105   M     1          1      N AILMENT        1 102.3103
# 2      105   M     1          1      N COLLEGE        1 111.8064
# 3      105   M     1          1      N  FATHER        1 124.0666
# 4      105   M     1          1      N AILMENT        2 102.3103
# 5      105   M     1          1      N COLLEGE        2 111.8031
# 6      105   M     1          1      N  FATHER        2 124.0666
# 7      105   M     1          1      N AILMENT        3 102.3103
# 8      105   M     1          1      N COLLEGE        3 111.6861
# 9      105   M     1          1      N  FATHER        3 124.0666
# 10     105   M     1          1      N AILMENT        4 102.3113
# 11     105   M     1          1      N COLLEGE        4 111.3602
# 12     105   M     1          1      N  FATHER        4 124.0666

In base R, you can also use stack to stack the columns named 1 through 4, and cbind that with the first group of columns. 在基础R中,您还可以使用stack来堆叠名为1到4的列,并使用第一组列cbind Alternatively, unlist will also do this. 或者, unlist也会这样做。


You may also want to look into the "data.table" package to get a bit of a speed boost. 您可能还想查看“data.table”包以获得一点速度提升。

With reshape : reshape

x <- read.table(header=T, text="Speaker Sex Group Repetition Accent    Word         1         2         3        4
105   M     1          1      N AILMENT 102.31030 102.31030 102.31030 102.31127
105   M     1          1      N COLLEGE 111.80641 111.80313 111.68612 111.36020
105   M     1          1      N  FATHER 124.06655 124.06655 124.06655 124.06655")

reshape(x, direction="long", sep='', varying=paste0('X', 1:4))
##     Speaker Sex Group Repetition Accent    Word time        X id
## 1.1     105   M     1          1      N AILMENT    1 102.3103  1
## 2.1     105   M     1          1      N COLLEGE    1 111.8064  2
## 3.1     105   M     1          1      N  FATHER    1 124.0666  3
## 1.2     105   M     1          1      N AILMENT    2 102.3103  1
## 2.2     105   M     1          1      N COLLEGE    2 111.8031  2
## 3.2     105   M     1          1      N  FATHER    2 124.0666  3
## 1.3     105   M     1          1      N AILMENT    3 102.3103  1
## 2.3     105   M     1          1      N COLLEGE    3 111.6861  2
## 3.3     105   M     1          1      N  FATHER    3 124.0666  3
## 1.4     105   M     1          1      N AILMENT    4 102.3113  1
## 2.4     105   M     1          1      N COLLEGE    4 111.3602  2
## 3.4     105   M     1          1      N  FATHER    4 124.0666  3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM