简体   繁体   English

将 dataframe 的所有行与新变量合并为一行

[英]Combine all rows of dataframe into one row with new variables

I have a large dataset of segmented cells at different timepoints of a cell culture.我在细胞培养的不同时间点有大量的分段细胞数据集。 My end goal is to use a classifier to be able to separate cells from the early time points from later cells by using their shape.我的最终目标是使用分类器能够通过使用它们的形状将细胞从早期时间点与后期细胞分开。 However, what I need to use for this is the overall composition of the shape of the cells (ie a later timepoint will contain more larger cells that are less round).但是,我需要使用的是单元格形状的整体组成(即稍后的时间点将包含更多较大的单元格,这些单元格不那么圆)。

My data looks like this, passage being the "class" variable (1:8).我的数据看起来像这样,段落是“类”变量(1:8)。

     Area Circ.       X      Y  Major  Minor   Angle    AR Round Solidity Passage
1 270.606 0.476 369.677 11.832 21.497 16.028 177.550 1.341 0.746    0.902       1
2 260.733 0.652 219.469 14.233 18.847 17.614  51.695 1.070 0.935    0.948       1
3 444.248 0.682  70.619 24.071 27.845 20.313  54.227 1.371 0.730    0.953       1
4 236.565 0.607 409.612 21.472 18.800 16.022 110.348 1.173 0.852    0.939       1
5 291.237 0.376 547.529 19.330 30.212 12.274 178.844 2.462 0.406    0.915       1
6 201.690 0.662 202.990 20.799 16.457 15.604  54.949 1.055 0.948    0.949       1

My approach now is to sample n amount of cells from one passage and train the classifier with this.我现在的方法是从一个通道中采样 n 数量的细胞并以此训练分类器。

#number of cells in image
num_draw = 15
all_samples <- list()
for (i in 1:8){
  sample_list <- list()
  for (j in 1:1000){
    samples = filter(df, df$Passage == i)
    sample_list[[j]] = samples[sample(nrow(samples), 20), ]
  }
  all_samples[[i]] <- sample_list
}

However, most classifiers seem to only take data that has one observation per row, so I think my best shot is to combine all cells in a sample to a single row, keeping all the variables/columns (probably as Area1, Area2, Area3,...) except passage, which is combined to one column, then combining all observations back together into one dataframe.但是,大多数分类器似乎只获取每行有一个观察值的数据,所以我认为我最好的方法是将样本中的所有单元格组合成一行,保留所有变量/列(可能是 Area1、Area2、Area3、 ...)除了段落,它被合并为一列,然后将所有观察结果组合回一个 dataframe。

Is there a quick way of doing this?有没有一种快速的方法来做到这一点? Or can you recommend me a classifier that can take n instances of one class as one observation?或者你能给我推荐一个分类器,它可以将一个 class 的 n 个实例作为一个观察值?

I'm not sure why you would want to do this because I think it will make it much harder to work with your data.我不确定您为什么要这样做,因为我认为这会使处理您的数据变得更加困难。 Maybe think about pivot_longer instead, and using group_by and summarize.也许考虑使用 pivot_longer,并使用 group_by 和 summarise。 Anyways here is my solution.无论如何,这是我的解决方案。 Best of luck!祝你好运!

library(tidyr)
library(dplyr)

#create data
df <- data.frame(Area = rnorm(4), circ = rnorm(4), x = rnorm(4), passage = c("1", "1", "2", "2"))

#spread data wide based off number of observations of passage
df <- group_by(df, passage) %>%
  mutate(reference = row_number()) %>% #creates reference for each obs of passage
  pivot_wider(id_cols = passage, names_from = reference, values_from = Area:x)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM