简体   繁体   中英

R new column (variable) that rowSums across lists with NULL values

I have a data.frame that looks like this:

UID<-c(rep(1:25, 2), rep(26:50, 2))
Group<-c(rep(5, 25), rep(20, 25), rep(-18, 25), rep(-80, 25))
Value<-sample(100:5000, 100, replace=TRUE)
df<-data.frame(UID, Group, Value)

But I need the values separated into new rows so I run this:

df<-pivot_wider(df, names_from = Group, 
                    values_from = Value, 
                    values_fill = list(Value = 0))

Which introduces NULL into the dataset. Sorry, could not figure out a way to get an example dataset with NULL values. Note: this is now a tbl_df tbl data.frame

These aren't great variable names so I run this:

colnames(df)[which(names(df) == "20")] <- "pos20"
colnames(df)[which(names(df) == "5")] <- "pos5"
colnames(df)[which(names(df) == "-18")] <- "neg18"
colnames(df)[which(names(df) == "-80")] <- "neg80"

What I want to be able to do is create a new column (variable) that rowSums across columns. So I run this:

df<-df%>%
  replace(is.na(.), 0) %>%
  mutate(rowTot = rowSums(.[2:5]))

Which of course works on the example dataset but not on the one with NULL values. I have tried converting NULL to NA using df[df== "NULL"] <- NA but the values do not change. I have tried converting the lists to numeric using as.numeric(as.character(unlist(df[[2]]))) but I get an error telling me I have unequal number of rows, which I guess would be expected.

I realize there might be a better process to get my desired end result, so any suggestions to any of this is most appreciated.

EDIT: Here is a link to the actual dataset which will introduce Null values after using pivot_wider . https://drive.google.com/file/d/1YGh-Vjmpmpo8_sFAtGedxzfCiTpYnKZ3/view?usp=sharing

Difficult to answer with confidence without an actual reproducible example where the error occurs but I am going to take a guess.

I think your pivot_wider steps produces list columns (meaning some values are vectors) and that is why you are getting NULL values. Create a unique row for each Group and then use pivot_wider . Also rowSums has na.rm parameter so you don't need replace .

library(dplyr)
df %>% 
  group_by(temp) %>%   
  mutate(row = row_number()) %>% 
  pivot_wider(names_from = temp, values_from = numseeds) %>%
  mutate(rowTot = rowSums(.[3:6], na.rm = TRUE))

Please change the column numbers according to your data in rowSums if needed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM