如何使用 R 在不同列中查找非 NA 和 NA 的重叠数

Question

I am going to find the number of overlaps between some groups (arranged as columns of data set).我将找到一些组之间的重叠数量（排列为数据集的列）。 In fact, I want to use these values to draw a venn diagram for my data.事实上，我想使用这些值来为我的数据绘制维恩图。 I require to count all non-NA values for each column and also the number of non-NA values overlapped between different columns (eg group 1 with group2 or group1, group2 and group4, ..........).我需要计算每列的所有非 NA 值以及不同列之间重叠的非 NA 值的数量（例如，组 1 与组 2 或组 1、组 2 和组 4，......）。 The content of cells is not important and I am not looking for common cells between columns.单元格的内容并不重要，我不是在寻找列之间的公共单元格。 I just want to count non-NAs regardless of their content.我只想计算非 NA，无论其内容如何。 Do you have any idea to do it using R or python.您有什么想法可以使用 R 或 python 来实现。 example of part of data: structure(list(V1 = c("Group1", "XP_032738419.1", "XP_032715310.1", "XP_032703108.1", "XP_032700385.1", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), V2 = c("Group2", "XP_011286297.1, XP_011286306.1, XP_019670819.1, XP_019670818.1, XP_023097752.1, XP_011286308.1, XP_011286311.1, XP_023097760.1, XP_011286303.1, XP_023097755.1, XP_023097756.1, XP_023097757.1, XP_023097758.1, XP_023097754.1, XP_023097753.1, XP_011286310.1, XP_023097759.1, XP_019670826.1, XP_011286304.1, XP_019670828.1", NA, "XP_019685915.1", "XP_023112367.1", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), V3 = c("Group3", "XP_038528678.1", "XP_038300380.1", "XP_038538922.1", "XP_038295408.", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), V4 = c("Group4", "XP_012903997.1", "XP_004748105.1, XP_012909429.1", "XP_012905661.1", "XP_012901919.1", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), V5 = c("Group5", "NP_001310871.1", "NP_001341201.1", "N部分数据示例：结构（列表（V1 = c（“Group1”，“XP_032738419.1”，“XP_032715310.1”，“XP_032703108.1”，“XP_032700385.1”，NA，NA，NA，NA， NA，NA，NA，NA，NA，NA，NA，NA，NA，NA，NA），V2 = c（“Group2”，“XP_011286297.1，XP_011286306.1，XP_019670819.1，XP_019670818.1，XP_023097752。 1, XP_011286308.1, XP_011286311.1, XP_023097760.1, XP_011286303.1, XP_023097755.1, XP_023097756.1, XP_023097757.1, XP_023097758.1, XP_023097754.1, XP_023097753.1, XP_011286310.1, XP_023097759.1, XP_019670826.1，XP_011286304.1，XP_019670828.1"，NA，"XP_019685915.1"，"XP_023112367.1"，NA，NA，NA，NA，NA，NA，NA，NA，NA，NA，NA，NA ，NA，NA，NA），V3 = c（“Group3”，“XP_038528678.1”，“XP_038300380.1”，“XP_038538922.1”，“XP_038295408。”，NA，NA，NA，NA，NA，NA ，NA，NA，NA，NA，NA，NA，NA，NA，NA），V4 = c（“Group4”，“XP_012903997.1”，“XP_004748105.1，XP_012909429.1”，“XP_012905661.1”， “XP_012901919.1”，NA，NA，NA，NA，NA，NA，NA，NA，NA，NA，NA，NA，NA，NA，NA），V5 = c（“Group5”，“NP_001310871.1” , "NP_001341201.1", "N P_001374917.1", "NP_001123304.1", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), V6 = c("Group6", "XP_044098939.1", "XP_044080143.1", "XP_044112499.1", "XP_044084408.1", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c("1", "2", "3", "4", "5", "NA", "NA.1", "NA.2", "NA.3", "NA.4", "NA.5", "NA.6", "NA.7", "NA.8", "NA.9", "NA.10", "NA.11", "NA.12", "NA.13", "NA.14"), class = "data.frame") P_001374917.1", "NP_001123304.1", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), V6 = c("Group6", “XP_044098939.1”，“XP_044080143.1”，“XP_044112499.1”，“XP_044084408.1”，不适用，不适用，不适用，不适用，不适用，不适用，不适用，不适用，不适用，不适用，不适用，不适用，不适用， NA，NA）），row.names = c（“1”，“2”，“3”，“4”，“5”，“NA”，“NA.1”，“NA.2”，“NA .3”、“NA.4”、“NA.5”、“NA.6”、“NA.7”、“NA.8”、“NA.9”、“NA.10”、“NA.11” ", "NA.12", "NA.13", "NA.14"), class = "data.frame")

Answer 1

The number of rows where both Group1 and Group2 are NA would be: Group1 和 Group2 均为 NA 的行数为：

 sum( rowSums( is.na( dfrm[1:2]) ) == 2)

At least I hope it would be.至少我希望是这样。 is.na(.) applied to a dataframe (or subset of a dataframe as I attempt here) should return an equivalently dimensioned, logical dataframe and then you can test whether a rowSum of logicals (1=TRUE, 0=FALSE) is 2. Then you add them up. is.na(.) applied to a dataframe (or subset of a dataframe as I attempt here) should return an equivalently dimensioned, logical dataframe and then you can test whether a rowSum of logicals (1=TRUE, 0=FALSE) is 2 . 然后你把它们加起来。 Regular R code is best read from the inside out as this demonstrates.常规的 R 代码最好从内到外阅读，如下所示。 The magrittr / tidyverse language variant of R reverses the arrangement of arguments and functions. R 的magrittr / tidyverse语言变体颠倒了 arguments 和函数的排列。 You should still delete the image and post [MCVE], and ideally you would [edit] your question so that the clarifying information in your comment would be made available for persons searching for similar help.您仍应删除图像并发布 [MCVE]，理想情况下您将 [编辑] 您的问题，以便您的评论中的澄清信息可供寻求类似帮助的人使用。

如何使用 R 在不同列中查找非 NA 和 NA 的重叠数

问题描述

1 个解决方案

解决方案1
0 2021-11-27 00:49:49

如何使用 R 在不同列中查找非 NA 和 NA 的重叠数

问题描述

1 个解决方案

解决方案1 0 2021-11-27 00:49:49

解决方案1
0 2021-11-27 00:49:49