简体   繁体   English

使用spread来创建带有tidyr的两个值列

[英]Using spread to create two value columns with tidyr

I have a data frame that looks just like this (see link). 我有一个看起来像这样的数据框(见链接)。 I'd like to take the output that is produced below and go one step further by spreading the tone variable across both the n and the average variables. 我想获取下面产生的输出,并通过在n和平均变量上扩展色调变量更进一步。 It seems like this topic might bear on this, but I can't get it to work: Is it possible to use spread on multiple columns in tidyr similar to dcast? 看起来这个主题可能会对此产生影响,但我无法让它起作用: 是否可以在tidyr中使用类似于dcast的多个列?

I'd like the final table to have the source variable in one column, then then the tone-n and tone-avg variables to be in columns. 我希望最终表将源变量放在一列中,然后将tone-n和tone-avg变量放在列中。 So I'd like the column headers to be "source" - "For - n" - "Against - n" "For -Avg" - "Against - Avg". 所以我希望列标题为“source” - “For - n” - “Against - n”“For -Avg” - “Against - Avg”。 This is for publication, not for further calculation, so it's about presenting data. 这是出版物,不是为了进一步计算,所以它是关于呈现数据。 It seems more intuitive to me to present data in this way. 以这种方式呈现数据对我来说似乎更直观。 Thank you. 谢谢。

#variable1
Politician.For<-sample(seq(0,4,1),50, replace=TRUE)
#variable2
Politician.Against<-sample(seq(0,4,1),50, replace=TRUE)
#Variable3
Activist.For<-sample(seq(0,4,1),50,replace=TRUE)
#variable4
Activist.Against<-sample(seq(0,4,1),50,replace=TRUE)
#dataframe
df<-data.frame(Politician.For, Politician.Against, Activist.For,Activist.Against)

#tidyr
df %>%
 #Gather all columns 
 gather(df) %>%
 #separate by the period character 
 #(default separation character is non-alpha numeric characterr) 
 separate(col=df, into=c('source', 'tone')) %>%
 #group by both source and tone  
 group_by(source,tone) %>%
 #summarise to create counts and average
 summarise(n=sum(value), avg=mean(value)) %>%
 #try to spread
 spread(tone, c('n', 'value'))

I think what you want is another gather to break out the count and mean as separate observations, the gather(type, val, -source, -tone) below. 我认为你想要的是另一个聚集来突破计数并且意味着单独的观察,下面的gather(type, val, -source, -tone)

gather(df, who, value) %>%
    separate(who, into=c('source', 'tone')) %>%
    group_by(source, tone) %>%
    summarise(n=sum(value), avg=mean(value)) %>%
    gather(type, val, -source, -tone) %>%
    unite(stat, c(tone, type)) %>%
    spread(stat, val)

Yields 产量

Source: local data frame [2 x 5]

      source Against_avg Against_n For_avg For_n
1   Activist        1.82        91    1.84    92
2 Politician        1.94        97    1.70    85

Using data.table syntax (thanks @akrun): 使用data.table语法(感谢@akrun):

library(data.table)
dcast(
  setDT(melt(df))[,c('source', 'tone'):=
      tstrsplit(variable, '[.]')
    ][,list(
      N  = sum(value),
      avg= mean(value))
    ,by=.(source, tone)],
  source~tone,
  value.var=c('N','avg'))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM