[英]How to select rows with more than one value in Pandas DataFrame
I have a DataFrame that looks like this:我有一个 DataFrame,看起来像这样:
Entry ribosome protein PDB
0 P46782 s5 4ug0;4v6x;5a2q;5aj0;5flx;5lks;5oa3;5t2c;5vyc;6...
1 P0A7W3 s5 5wf0;5wfs;6awb;6awc;6awd
2 A2RNN6 s5 5myj
3 Q5SHQ5 s5 1fjg;1fka;1hnw;1hnx;1hnz;1hr0;1i94;1i95;1i96;1...
4 Q2YYL4 s5 6fxc
5 A0QSG6 s5 5o5j;5o61;5xyu;5zeb;5zep;5zeu;6dzi;6dzk
6 P33759 s5 5mrc;5mre;5mrf`
I need to extract rows that have more than one entry in a column 'PDB'.我需要提取在“PDB”列中具有多个条目的行。 For example, I want to have the DataFrame that shows rows without "6fxc" and "5myj" (single entries) in this case, but only multiple PDBs like "5mrc;5mre;5mrf".例如,在这种情况下,我希望 DataFrame 显示没有“6fxc”和“5myj”(单个条目)的行,但只有多个 PDB,如“5mrc;5mre;5mrf”。
How to do it?怎么做?
This is only a fragment of a huge dataframe with such data, that I need to filter this way.这只是包含此类数据的巨大 dataframe 的一部分,我需要以这种方式进行过滤。
May be you can use something with split
and len
and followed by filtering it:也许你可以使用split
和len
的东西然后过滤它:
df[df['PDB'].str.split(';').str.len()>1]
Following comment, you can also try simply counting ;
在评论之后,您也可以尝试简单地计数;
as following:如下:
df[df['PDB'].str.count(";")>0]
You can omit the rows whose PDB
field contains no ;
您可以省略其PDB
字段不包含任何行;
like this:像这样:
df[df['PDB'].str.contains(';')]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.