简体   繁体   中英

R: Add a column of shorter length that subtracts each row in a single column, 1st - 2nd, 2nd - 3rd

I have a data frame that looks like this sx16 data frame:

在此输入图像描述

Incase the link doesnt work:

The data frame is called sx16

It has column names: Date, Open, High, Low, Settle

I want to add a column called up_period that prints a 1 if the below calc is positive and a 0 if the below calc is negative:

sx16$Settle[ 1: nrow(sx16)] - sx16$Settle[ 2: nrow(sx16)]

Of course, this produces an error as the new list is shorter than the original sx16.

I have tried to wrap rbind.fill around it like so:

sx16$up_period <- rbind.fill(sx16$Settle[ 1: nrow(sx16)] - sx16$Settle[ 2: nrow(sx16)])

But this produces the following error:

Warning message: In sx16$Settle[1:nrow(sx16)] - sx16$Settle[2:nrow(sx16)] : longer object length is not a multiple of shorter object length

Of course, that is exactly what I thought rbind.fill would solve. Here is where I am stuck. Once I get this, I can add a simple if-else to do the 1 and 0, but I cannot figure out how to add this shorter column to my data frame.

试试这个(最后的up_period没有定义):

sx16$up_period <- sx16$Settle - c(sx16$Settle[-1],NA)

You can use lead from the dplyr package:

library(dplyr)
result <- sx16 %>% mutate(up_period=as.numeric((Settle-lead(Settle,default=NA)) > 0))
##        Date   Open   High    Low Settle up_period
##1 2016-09-30 950.00 958.50 943.00 954.00         1
##2 2016-09-29 947.00 957.25 946.00 950.25         1
##3 2016-09-28 951.75 955.75 944.50 945.50         0
##4 2016-09-27 946.75 953.50 934.00 952.50         1
##5 2016-09-26 951.50 960.25 943.75 945.25         0
##6 2016-09-23 975.00 976.25 952.50 955.00        NA

Here, we explicitly set the default parameter for lead to NA to fill in the value at the end to show that we can set this to another value such as the last value if we want. Note that there is also no need to use an if-else as we can convert the boolean to 1,0 using as.numeric .

The dput for your data is:

sx16 <- structure(list(Date = structure(c(17074, 17073, 17072, 17071, 
17070, 17067), class = "Date"), Open = c(950, 947, 951.75, 946.75, 
951.5, 975), High = c(958.5, 957.25, 955.75, 953.5, 960.25, 976.25
), Low = c(943, 946, 944.5, 934, 943.75, 952.5), Settle = c(954, 
950.25, 945.5, 952.5, 945.25, 955)), .Names = c("Date", "Open", 
"High", "Low", "Settle"), row.names = c(NA, -6L), class = "data.frame")

I'm surprised nobody mentioned diff yet. diff(sx16$Settle) is the equivalent of sx16$Settle[2:nrow(sx16)] - sx16$Settle[1:(nrow(sx16)-1)] . So the following would work for you:

sx16$up_period <- c(ifelse(diff(sx16$Settle)<0, 1, 0), NA)

I'll use the iris data set:

x <- iris 
dummy <- x$Sepal.Length             #repeat column again but rename dummy
dummy[length(dummy)+1]=0            #add a value of 0 to the end for the day thats not happened yet
dummy <- dummy[2:length(dummy)]     #translate the column to match the original for calculation
x <- cbind(x,dummy)                 #add the column to the data
x$up <- x$Sepal.Length-x$dummy      #new calculated column
x$dummy <- NULL                     #remove dummy

So essentially, I added your column again, translated it down one position and then calculated using that dummy column.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM