简体   繁体   English

从数据框中删除行

[英]removing lines from a data frame

I have the following CSV: 我有以下CSV文件:

#GeneName,GeneId,TranscriptId,BioType,variants_impact_HIGH,variants_impact_LOW,variants_impact_MODERATE,variants_impact_MODIFIER,variants_effect_3_prime_UTR_variant,variants_effect_5_prime_UTR_premature_start_codon_gain_variant,variants_effect_5_prime_UTR_variant,variants_effect_downstream_gene_variant,variants_effect_initiator_codon_variant,variants_effect_intron_variant,variants_effect_missense_variant,variants_effect_non_canonical_start_codon,variants_effect_splice_acceptor_variant,variants_effect_splice_donor_variant,variants_effect_splice_region_variant,variants_effect_start_lost,variants_effect_stop_gained,variants_effect_stop_lost,variants_effect_stop_retained_variant,variants_effect_synonymous_variant,variants_effect_upstream_gene_variant
"Box II Factor, partial",Nbv6.1trP45282.path1,Nbv6.1trP45282.mrna1,protein_coding,1,0,1,164,0,0,0,59,0,58,1,0,0,0,0,0,1,0,0,0,47
CYP71D5v3,Nbv6.1trP49735.path1,Nbv6.1trP49735.mrna1,protein_coding,1,0,3,57,0,0,0,14,0,1,3,0,0,0,0,0,1,0,0,0,42
"Chain A, 5-Epi-Aristolochene Synthase From Nicotiana Tabacum",Nbv6.1trP39231.path1,Nbv6.1trP39231.mrna1,protein_coding,1,5,2,86,13,0,0,33,0,10,2,0,0,0,3,0,1,0,0,2,33
"Cysteine-rich RLK (RECEPTOR-like protein kinase) 8, putative",Nbv6.1trP40249.path1,Nbv6.1trP40249.mrna1,protein_coding,1,1,2,85,1,0,0,66,0,0,2,0,0,0,0,0,1,0,0,1,18
DnaJ protein,Nbv6.1trP36411.path1,Nbv6.1trP36411.mrna1,protein_coding,1,2,2,252,7,1,2,77,0,0,2,0,0,0,1,0,1,0,0,0,166
F10A5.19,Nbv6.1trP21304.path1,Nbv6.1trP21304.mrna1,protein_coding,1,1,0,80,2,1,4,31,0,28,0,0,0,0,0,0,1,0,0,0,15
Integrase core domain containing protein,Nbv6.1trP22629.path1,Nbv6.1trP22629.mrna1,protein_coding,1,0,0,19,0,0,0,8,0,0,0,0,0,0,0,0,1,0,0,0,11
Integrase core domain containing protein,Nbv6.1trP23238.path1,Nbv6.1trP23238.mrna1,protein_coding,1,3,5,100,10,3,7,13,0,0,5,0,0,0,0,0,1,0,0,0,70
Integrase core domain containing protein,Nbv6.1trP25807.path1,Nbv6.1trP25807.mrna1,protein_coding,1,1,0,27,0,0,0,10,0,0,0,0,0,0,0,0,1,0,0,1,17
Integrase core domain containing protein,Nbv6.1trP40184.path1,Nbv6.1trP40184.mrna1,protein_coding,1,3,2,146,1,0,0,82,0,0,2,0,0,0,1,0,1,0,1,1,63
Integrase core domain containing protein,Nbv6.1trP51171.path1,Nbv6.1trP51171.mrna1,protein_coding,1,1,3,167,0,0,0,84,0,0,3,0,0,0,0,0,1,0,0,1,83
Integrase core domain containing protein,Nbv6.1trP55943.path1,Nbv6.1trP55943.mrna1,protein_coding,1,0,0,44,0,0,0,23,0,0,0,0,0,0,0,0,1,0,0,0,21
Integrase core domain containing protein,Nbv6.1trP62081.path1,Nbv6.1trP62081.mrna1,protein_coding,1,2,2,63,0,0,2,11,0,20,2,0,0,0,0,0,1,0,0,2,30
Integrase core domain containing protein,Nbv6.1trP62783.path1,Nbv6.1trP62783.mrna1,protein_coding,1,2,1,35,0,0,0,8,0,0,1,0,0,0,0,0,1,0,0,2,27
Integrase core domain containing protein,Nbv6.1trP65782.path1,Nbv6.1trP65782.mrna1,protein_coding,1,0,6,66,1,0,0,30,0,0,6,0,0,0,0,0,1,0,0,0,35
KED,Nbv6.1trP20392.path1,Nbv6.1trP20392.mrna1,protein_coding,1,8,14,246,2,1,4,70,0,13,14,0,0,0,1,0,1,0,0,6,158
probable haloacid dehalogenase-like hydrolase domain-containing protein 3,Nbv6.1trP38253.path1,Nbv6.1trP38253.mrna1,protein_coding,1,0,1,27,0,0,0,5,0,4,1,0,0,0,2,0,1,0,0,0,18
probable heat shock cognate 70 kDa protein 2-like,Nbv6.1trP74610.path2,Nbv6.1trP74610.mrna2,protein_coding,1,0,1,39,0,0,0,19,0,0,1,0,0,0,0,0,1,0,0,0,20
probable heparanase-like protein 2,Nbv6.1trP4097.path1,Nbv6.1trP4097.mrna1,protein_coding,1,0,1,14,0,0,0,7,0,1,1,0,0,0,0,0,1,0,0,0,6
probable heparanase-like protein 2 isoform X1,Nbv6.1trP61420.path1,Nbv6.1trP61420.mrna1,protein_coding,1,0,3,28,1,0,0,19,0,2,3,0,0,0,0,0,1,0,0,0,6
probable osmotin-like protein,Nbv6.1trP51931.path1,Nbv6.1trP51931.mrna1,protein_coding,1,1,2,95,1,0,0,78,0,0,2,0,0,0,0,0,1,0,0,1,16
probable osmotin-like protein,Nbv6.1trP58568.path1,Nbv6.1trP58568.mrna1,protein_coding,1,1,2,95,1,0,0,78,0,0,2,0,0,0,0,0,1,0,0,1,16
probable osmotin-like protein,Nbv6.1trP58569.path1,Nbv6.1trP58569.mrna1,protein_coding,1,1,2,95,1,0,0,78,0,0,2,0,0,0,0,0,1,0,0,1,16
probable osmotin-like protein,Nbv6.1trP67382.path1,Nbv6.1trP67382.mrna1,protein_coding,1,1,2,95,1,0,0,78,0,0,2,0,0,0,0,0,1,0,0,1,16

The below code produce a Bar chart from a CSV 以下代码从CSV生成条形图

library("ggplot2")
chol <- read.csv("test.txt")
chol$X.GeneName

ggplot(chol, aes(X.GeneName)) + geom_histogram(stat = "count") +
  theme(axis.text.x = element_text(angle = 45, 
                                   hjust = 1)) 

在此处输入图片说明

Is there a way to remove lines which appear only ones out of chol$X.GeneName ? 有没有办法删除仅在chol$X.GeneName出现的行?

Thank you in advance 先感谢您

Using the following synthetic data: 使用以下综合数据:

chol <- tibble(X.GeneName = c(sample(c("x", "y"), 100, T, c(.2, .6)), "z"))

I would compute the counts for each value of X.GeneName , filter out any counts == 1, and then plot using geom_col , which uses stat_identity by default (a histogram is not really the best choice for the data): 我将计算X.GeneName每个值的X.GeneName ,过滤掉任何== 1的计数,然后使用geom_col进行geom_col ,默认情况下使用stat_identity (直方图并不是数据的最佳选择):

library(tidyverse)

chol %>% 
    count(X.GeneName) %>% 
    filter(n != 1) %>% 
    ggplot(aes(x = X.GeneName, y = n)) + 
    geom_col()

The value "z" has a count of 1 and has been removed from the data using filter(n != 1) : 值“ z”的计数为1,并已使用filter(n != 1)从数据中删除:

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM