简体   繁体   English

使用dplyr从第二行跨多个列更改R数据帧

[英]Use dplyr to change an R dataframe from second row across multiple columns

I have a large dataframe similar to the toy dataset created below 我有一个大型数据框,类似于下面创建的玩具数据集

df<-data.frame("ID"=c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B"), 
'A_Frequency'=c(1,2,3,4,5,1,2,3,4,5), 'A_Axis'=c(1,2,3,4,5,1,2,3,4,5))

The dataframe consists of an ID column and a two columns A_Frequency and A_Axis. 数据框由一个ID列和两列A_Frequency和A_Axis组成。 I have created a column called A_Slope and filled it using the following for loop 我创建了一个名为A_Slope的列,并使用以下for循环将其填充

id1<-unique(df$ID)###########Create list of unique IDs to subset the dataframe

In this loop we calculate A_Slope value such that the values are calculated subsetting the dataframe df by unique id and then, the values are calculated from the second row to the last row, ignoring the first row in all cases 在此循环中,我们计算A_Slope值,以便通过唯一ID将数据帧df替换为数据帧df,然后从第二行到最后一行计算值,而在所有情况下都忽略第一行

for( j in id1){
for( i in 2:nrow(df[df$ID==df$ID[df$ID%in%j],])){
df$A_Slope[df$ID==df$ID[df$ID%in%j]][i]=10*log(2, 
10)*log((df$A_Axis[df$ID==df$ID[df$ID%in%j]][i])/

(df$A_Axis[df$ID==df$ID[df$ID%in%j]][i-1]), base = 
10)/log((df$A_Frequency[df$ID==df$ID[df$ID%in%j]] 
[i])/(df$A_Frequency[df$ID==df$ID[df$ID%in%j]][i-1]),base = 10 )}}

This works well for the toy set. 这对于玩具套装非常有效。 I have a large dataframe with multiple columns. 我有一个多列的大型数据框。 is it possible to use dplyr to do the same using mutate. 是否可以使用dplyr使用mutate进行相同的操作。

Expected Output 预期产量

        ID A_Frequency A_Axis     A_Slope
     1   A           1      1          NA
     2   A           2      2 3.010299957
     3   A           3      3 3.010299957
     4   A           4      4 3.010299957
     5   A           5      5 3.010299957
     6   B           1      1          NA
     7   B           2      2 3.010299957
     8   B           3      3 3.010299957
     9   B           4      4 3.010299957
     10  B           5      5 3.010299957

Note : the two NA values in A_Slope column can be zero also- not necessrily NA 注意:A_Slope列中的两个NA值也可以为零-不必要NA

Hopefully I have translated your code correctly. 希望我已经正确翻译了您的代码。

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(A_Slope = 10 * log10(2) * log10(A_Axis/lag(A_Axis))/
                                    log10(A_Frequency/lag(A_Frequency)))


#  ID    A_Frequency A_Axis A_Slope
#  <fct>       <dbl>  <dbl>   <dbl>
# 1 A               1      1    NA   
# 2 A               2      2    3.01
# 3 A               3      3    3.01
# 4 A               4      4    3.01
# 5 A               5      5    3.01
# 6 B               1      1    NA   
# 7 B               2      2    3.01
# 8 B               3      3    3.01
# 9 B               4      4    3.01
#10 B               5      5    3.01

Some pointers to understand the code 一些了解代码的指针

  • log(x, 10) replaced with log10(x) log(x, 10)替换为log10(x)
  • to get previous value ( i - 1 ) we use lag here. 为了获得先前的值( i - 1 ),我们在这里使用lag

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM