简体   繁体   English

根据因子列转置数据帧

[英]Transposing data frame based on factor column

Let's assume I have a dataframe in the following format, obtained from a .csv file: 假设我有一个从.csv文件获得的以下格式的数据框:

Measurement    Config  Value
---------------------------    _
Time           A       10       |      
Object         A       20       | Run 1    
Nodes          A       30      _|     
Time           A       8        |     
Object         A       18       | Run 2
Nodes          A       29      _|
Time           B       9        |
Object         B       20       | Run 3
Nodes          B       35      _|
...

There are a fixed number of Measurements that are taken during each run, and each run is run with a given Config . 每次运行期间都会进行固定数量的Measurements ,并且每次运行都使用给定的Config The Measurements per run are fixed (eg, every run consists of a Time, an Objects and a Nodes measurement in the example above), but there can be multiple runs for a single config (eg, Config A was run two times in the example above, B only once) 每次运行的Measurements是固定的(例如,在上面的示例中,每次运行均由“时间”,“对象”和“节点”度量组成),但是单个配置可以有多个运行(例如,在示例中,Config A运行了两次以上, B仅一次)

My primary goal is to plot correlations (scatter plots) between two of those measurement types, eg, plot Objects (x-axis) against Nodes (y-axis) and highlight different Configs (color) 我的主要目标是绘制两种测量类型之间的相关性(散点图),例如,将Objects (x轴)相对于Nodes (y轴)作图,并突出显示不同的Configs (颜色)

I thought that this could be best achieved if the dataframe is in the following format: 我认为,如果数据框采用以下格式,则可以最好地实现:

Config  Time  Objects  Nodes
--------------------------
A       10    20       30         <- Run 1
A       8     18       29         <- Run 2
B       9     20       35         <- Run 3

Ie, creating the columns based on the factor-values of the Measurement -column, and assigning the respective Value -value to the cells. 即,基于“ Measurement列的因子值创建列,并将相应的“ Value值分配给单元格。

Is there an "easy" way in R to achieve that? R中是否有“简单”的方法来实现这一目标?

First create a run variable: 首先创建一个run变量:

# option 1:
d$run <- ceiling(seq_along(d$Measurement)/3)

# option 2:
d$run <- 1 + (seq_along(d$Config)-1) %/% 3

Then you reshape to wide wide format with the dcast function from reshape2 or data.table : 然后,使用reshape2data.tabledcast函数将其重塑为宽格式:

reshape2::dcast(d, Config + run ~ Measurement, value.var = 'Value')

you will then get: 您将得到:

  Config run Nodes Object Time
1      A   1    30     20   10
2      A   2    29     18    8
3      B   3    35     20    9

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM