简体   繁体   English

如何使用 R 在不同列中查找非 NA 和 NA 的重叠数

[英]How to find number of overlaps for non-NA and NA among different columns using R

I am going to find the number of overlaps between some groups (arranged as columns of data set).我将找到一些组之间的重叠数量(排列为数据集的列)。 In fact, I want to use these values to draw a venn diagram for my data.事实上,我想使用这些值来为我的数据绘制维恩图。 I require to count all non-NA values for each column and also the number of non-NA values overlapped between different columns (eg group 1 with group2 or group1, group2 and group4, ..........).我需要计算每列的所有非 NA 值以及不同列之间重叠的非 NA 值的数量(例如,组 1 与组 2 或组 1、组 2 和组 4,......)。 The content of cells is not important and I am not looking for common cells between columns.单元格的内容并不重要,我不是在寻找列之间的公共单元格。 I just want to count non-NAs regardless of their content.我只想计算非 NA,无论其内容如何。 Do you have any idea to do it using R or python.您有什么想法可以使用 R 或 python 来实现。 example of part of data: structure(list(V1 = c("Group1", "XP_032738419.1", "XP_032715310.1", "XP_032703108.1", "XP_032700385.1", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), V2 = c("Group2", "XP_011286297.1, XP_011286306.1, XP_019670819.1, XP_019670818.1, XP_023097752.1, XP_011286308.1, XP_011286311.1, XP_023097760.1, XP_011286303.1, XP_023097755.1, XP_023097756.1, XP_023097757.1, XP_023097758.1, XP_023097754.1, XP_023097753.1, XP_011286310.1, XP_023097759.1, XP_019670826.1, XP_011286304.1, XP_019670828.1", NA, "XP_019685915.1", "XP_023112367.1", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), V3 = c("Group3", "XP_038528678.1", "XP_038300380.1", "XP_038538922.1", "XP_038295408.", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), V4 = c("Group4", "XP_012903997.1", "XP_004748105.1, XP_012909429.1", "XP_012905661.1", "XP_012901919.1", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), V5 = c("Group5", "NP_001310871.1", "NP_001341201.1", "N部分数据示例:结构(列表(V1 = c(“Group1”,“XP_032738419.1”,“XP_032715310.1”,“XP_032703108.1”,“XP_032700385.1”,NA,NA,NA,NA, NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),V2 = c(“Group2”,“XP_011286297.1,XP_011286306.1,XP_019670819.1,XP_019670818.1,XP_023097752。 1, XP_011286308.1, XP_011286311.1, XP_023097760.1, XP_011286303.1, XP_023097755.1, XP_023097756.1, XP_023097757.1, XP_023097758.1, XP_023097754.1, XP_023097753.1, XP_011286310.1, XP_023097759.1, XP_019670826.1,XP_011286304.1,XP_019670828.1",NA,"XP_019685915.1","XP_023112367.1",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA ,NA,NA,NA),V3 = c(“Group3”,“XP_038528678.1”,“XP_038300380.1”,“XP_038538922.1”,“XP_038295408。”,NA,NA,NA,NA,NA,NA ,NA,NA,NA,NA,NA,NA,NA,NA,NA),V4 = c(“Group4”,“XP_012903997.1”,“XP_004748105.1,XP_012909429.1”,“XP_012905661.1”, “XP_012901919.1”,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),V5 = c(“Group5”,“NP_001310871.1” , "NP_001341201.1", "N P_001374917.1", "NP_001123304.1", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), V6 = c("Group6", "XP_044098939.1", "XP_044080143.1", "XP_044112499.1", "XP_044084408.1", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c("1", "2", "3", "4", "5", "NA", "NA.1", "NA.2", "NA.3", "NA.4", "NA.5", "NA.6", "NA.7", "NA.8", "NA.9", "NA.10", "NA.11", "NA.12", "NA.13", "NA.14"), class = "data.frame") P_001374917.1", "NP_001123304.1", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), V6 = c("Group6", “XP_044098939.1”,“XP_044080143.1”,“XP_044112499.1”,“XP_044084408.1”,不适用,不适用,不适用,不适用,不适用,不适用,不适用,不适用,不适用,不适用,不适用,不适用,不适用, NA,NA)),row.names = c(“1”,“2”,“3”,“4”,“5”,“NA”,“NA.1”,“NA.2”,“NA .3”、“NA.4”、“NA.5”、“NA.6”、“NA.7”、“NA.8”、“NA.9”、“NA.10”、“NA.11” ", "NA.12", "NA.13", "NA.14"), class = "data.frame")

The number of rows where both Group1 and Group2 are NA would be: Group1 和 Group2 均为 NA 的行数为:

 sum( rowSums( is.na( dfrm[1:2]) ) == 2)

At least I hope it would be.至少我希望是这样。 is.na(.) applied to a dataframe (or subset of a dataframe as I attempt here) should return an equivalently dimensioned, logical dataframe and then you can test whether a rowSum of logicals (1=TRUE, 0=FALSE) is 2. Then you add them up. is.na(.) applied to a dataframe (or subset of a dataframe as I attempt here) should return an equivalently dimensioned, logical dataframe and then you can test whether a rowSum of logicals (1=TRUE, 0=FALSE) is 2 . 然后你把它们加起来。 Regular R code is best read from the inside out as this demonstrates.常规的 R 代码最好从内到外阅读,如下所示。 The magrittr / tidyverse language variant of R reverses the arrangement of arguments and functions. R 的magrittr / tidyverse语言变体颠倒了 arguments 和函数的排列。 You should still delete the image and post [MCVE], and ideally you would [edit] your question so that the clarifying information in your comment would be made available for persons searching for similar help.您仍应删除图像并发布 [MCVE],理想情况下您将 [编辑] 您的问题,以便您的评论中的澄清信息可供寻求类似帮助的人使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 pandas 复制非 na 行以填充非 na 列 - Copy non-na rows to fill non-na columns using pandas 如何用 pandas 中前一个非 na 和下一个非 na 值的平均值填充列中的 na 值? - How do I fill na values in a column with the average of previous non-na and next non-na value in pandas? 从一组列中检索第一个非 NA 值 - Retrieve first non-NA value from a group of columns 如何计算 pandas DataFrame 相关函数中使用的非 NA/null 值的数量? - How do you calculate the number of non-NA/null values used in the pandas DataFrame correlation function? 仅在 2 个相同的非 NA 值之间填充 NA - Fill NA only between 2 same non-NA values 获取所有列对之间具有非 na 值的“周期”数的矩阵(联合计数) - Get a matrix of the number of “periods” with non-na values between all column pairs (joint count) python pandas-使用最后一个非na值计算百分比变化 - python pandas - calculate percentage change using last non-na value pandas function 检查是否存在相同 ID 的非 NA 值? - pandas function to check if there exist non-NA values for the same ids? 返回非NA值numpy数组的所有行索引 - Return all row indices of non-NA values numpy array 用另一个相同的键控行中的非NA值填充键控行中的NA列值 - Fill NA column values within a keyed row with non-NA values from another same keyed row
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM