简体   繁体   English

R中的dplyr变异-根据另一列的顺序添加新列

[英]dplyr mutate in R - adding a new column depending on sequence of another column

I am having an issue with mutate function in dplyr .我在dplyr遇到了mutate函数的问题。

  • I am trying to add a new column called state depending on the change in one of the column ( V column).我正在尝试根据其中一列( V列)的变化添加一个名为state的新列。 (V column repeat itself with a sequence so each sequence (rep(seq(100,2100,100),each=96) corresponds to one dataset in my df ) (V 列用一个序列重复自身,因此每个序列(rep(seq(100,2100,100),each=96)对应于我的df一个数据集)

Error: impossible to replicate vector of size 8064错误:无法复制大小为 8064 的向量

Here is reproducible example of md df :这是 md df可重现示例:

df <- data.frame (
    No=(No= rep(seq(0,95,1),times=84)), 
    AC= rep(rep(c(78,110),each=1),times=length(No)/2), 
    AR = rep(rep(c(256,320,384),each=2),times=length(No)/6), 
    AM =  rep(1,times=length(No)),
    DQ = rep(rep(seq(0,15,1),each=6),times=84),
    V = rep(rep(seq(100,2100,100),each=96),times=4),
    R = sort(replicate(6, sample(5000:6000,96))))

labels  <- rep(c("CAP-CAP","CP-CAP","CAP-CP","CP-CP"),each=2016) 

I added here 2016 value intentionally since I know the number of rows of each dataset.因为我知道每个数据集的行数,所以我特意在这里添加了2016值。

But I want to assign these labels with automated function when the dataset changes.但是我想在数据集更改时为这些标签分配自动化功能。 Because there is a possibility the total number of rows may change for each df for my real files.因为对于我的真实文件,每个df的总行数可能会发生变化。 For this question think about its only one txt file and also think about there are plenty of them with different number of rows.对于这个问题,请考虑它只有一个 txt 文件,并考虑其中有很多具有不同行数的文件。 But the format is the same.但是格式是一样的。

I use dplyr to arrange my df我使用dplyr来安排我的df

library("dplyr")
newdf<-df%>%mutate_each(funs(as.numeric))%>%
mutate(state = labels)

is there elegant way to do this process?有没有优雅的方法来完成这个过程?

Iff you know the number of data sets contained in df AND the column you're keying off --- here, V --- is ordered in df like it is in your toy data, then this works.如果您知道df包含的数据集数量以及您要关闭的列 --- 在这里, V --- 在df排序,就像在您的玩具数据中一样,那么这有效。 It's pretty clunky, and there should be a way to make it even more efficient, but it produced what I take to be the desired result:它非常笨重,应该有一种方法可以使其更加高效,但它产生了我认为是理想的结果:

# You'll need dplyr for the lead() part
library(dplyr)
# Make a vector with the labels for your subsets of df
labels <- c("AP-AP","P-AP","AP-P","P-P")
# This line a) produces an index that marks the final row of each subset in df
# with a 1 and then b) produces a vector with the row numbers of the 1s
endrows <- which(grepl(1, with(df, ifelse(lead(V) - V < 0, 1, 0))))
# This line uses those row numbers or the differences between them to tell rep()
# how many times to repeat each label
newdf$state <- c(rep(labels[1], endrows[1]), rep(labels[2], endrows[2] - endrows[1]),
    rep(labels[3], endrows[3] - endrows[2]), rep(labels[4], nrow(newdf) - endrows[3]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM