简体   繁体   中英

Remove NAs from dataframe generated with full_join from two scRNA-seq dataframes with equal nrow values and rownames

I've been working with a Log2 dataframe which looks like this:

library(dplyr)

str(df[1:10])
 $ 5W_Female_C#1_1    : num  0 0 0 0 0 ...
 $ 5W_Female_C#1_2    : num  2.28 0 0 0 0 ...
 $ 5W_Female_C#1_3    : num  0 0 0 0 0 ...
 $ 5W_Female_C#1_4    : num  2.15 0 0 1.79 0 ...
 $ 5W_Female_C#1_5    : num  0 0 0 0 0 ...
 $ 5W_Female_C#1_6    : num  0 0 0 0 0 ...
 $ 5W_Female_C#1_7    : num  0 0 0 1.41 0 ...
 $ 5W_Female_C#1_8    : num  0 0 0 0 0 ...
 $ 5W_Female_C#1_9    : num  0 0 0 0 0 ...
 $ 5W_Female_C#1_10   : num  0.18 0 0.18 0 0 ...
     dput(df[1:10,1:20])
structure(list(`5W_Female_C#1_1` = c(0, 0, 0, 0, 0, 0, 0, 1.23695175858808, 
2.08983709011962, 1.78366618090783), `5W_Female_C#1_2` = c(2.28362550060704, 
0, 0, 0, 0, 0.417920007811965, 0, 0, 4.23488447596799, 0), `5W_Female_C#1_3` = c(0, 
0, 0, 0, 0, 0, 0, 1.49722912878761, 2.95084163754915, 0), `5W_Female_C#1_4` = c(2.15088457130503, 
0, 0, 1.78993786898019, 0, 0.219091058246197, 0, 0, 3.48000655138599, 
0), `5W_Female_C#1_5` = c(0, 0, 0, 0, 0, 0, 0, 1.77610398807316, 
2.50182126542091, 0), `5W_Female_C#1_6` = c(0, 0, 0, 0, 0, 
0, 0, 3.01506932171765, 2.76107247078864, 1.42115596066222), 
    `5W_Female_C#1_7` = c(0, 0, 0, 1.40544784370754, 0, 0, 
    0, 1.12300395405482, 2.88009774972197, 0), `5W_Female_C#1_8` = c(0, 
    0, 0, 0, 0, 2.31875066934634, 0, 2.92257845650856, 3.34695688937888, 
    1.48284828306847), `5W_Female_C#1_9` = c(0, 0, 0, 0, 0, 
    0, 0, 1.61917821605907, 1.77273024776718, 2.09761079662642
    ), `5W_Female_C#1_10` = c(0.180147861158429, 0, 0.180147861158429, 
    0, 0, 0, 0, 0.180147861158429, 3.75103517666786, 0), `5W_Female_C#1_11` = c(0, 
    0, 0, 0.336854639125465, 0, 0, 0, 0, 2.7614980445501, 0), 
    `5W_Female_C#1_12` = c(0, 0, 0, 0, 0, 0, 0, 1.04404433270602, 
    3.39985467357243, 0), `5W_Female_C#1_13` = c(0, 0, 0, 0, 
    0, 0, 0, 0, 3.29484127140614, 1.12101540096137), `5W_Female_FGC#1_14` = c(0, 
    0, 0, 0, 0, 0, 0, 0, 3.00934717225925, 0), `5W_Female_C#1_15` = c(0.207892851641333, 
    0, 0, 0, 0, 1.26243315763135, 0, 0, 1.98294795515753, 0.829443681366591
    ), `5W_Female_C#1_16` = c(0, 0, 0, 1.15639661659767, 0, 
    0, 0, 1.8611613657534, 3.03509599789673, 0), `5W_Female_C#1_17` = c(0, 
    0, 0, 1.57385922157595, 0, 0, 0, 4.02874594222712, 0, 3.91513842592446
    ), `5W_Female_C#1_18` = c(0, 0, 0, 0, 0, 0, 0, 2.45180455572049, 
    3.72628628972067, 0), `5W_Female_C#1_19` = c(0.702214251010441, 
    0, 0, 0, 0, 0, 0, 1.41792000781196, 2.63853727921519, 1.17248751551013
    ), `5W_Female_C#1_20` = c(3.1243281350022, 0, 0, 0, 0, 
    0, 0, 0, 3.07963411236871, 0)), row.names = c("A1BG", "A1BG-AS1", 
"A1CF", "A2M", "A2M-AS1", "A2ML1", "A2MP1", "A4GALT", "AAAS", 
"AACS"), class = "data.frame")

Small window of data

This has been subsetted according to a list of genes/gene vector.

gene_list <- c("gene1","gene2","gene3","gene4","gene5")

This was followed by a subsequent subsetting according to age using the grep function. scdata4 <- as.data.frame(df[,grep("4W", colnames(df))]) scdata5 <- as.data.frame(df[,grep("5W", colnames(df))])

After this step, the row names(genes) were put under a column called genes using:

tibble::rownames_to_column(df, var="gene")

Finally, two of the dataframes generated were given as input for the full_join function and they have the same nrow value, rownames.

scdatajoin <- full_join(scdata4,scdata5, by = "gene")

And here is where I get errors, when I insert this output into pheatmap function after converting to matrix format with as.matrix().

scdatajoin <- as.matrix(scdatajoin) pheatmap(scdatajoin, color=rev(brewer.pal(9,"RdBu")), main = "4plus5w")

I get this error:

Error in hclust(d, method = method) : NA/NaN/Inf in foreign function call (arg 11)

Can somebody tell me how to correct this?

After removing one gene that despite being present in the Dataset, was filled with NAs instead of numerical values, i wass able to run the function. I realised this by using the heatmap.2 function from the package gplots , where i could visualize the absence of values on the measurements of this gene.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM