簡體   English   中英

如果列在列表中包含超過 x 個值,則刪除組

[英]Remove groups if column contain more than x number of value in a list

您好,我有一個元素列表,例如:

list_element=['Elephant','Monkey','Cow','Human','Bird','Snail','Snake','Donkey','Baboon','Orang-Outan']

和一個 dataframe

name  value
G1    Gr.1:4282399-4282564(+):Elephant
G1    SEQAHAHHE
G1    Zr.2:4282387-428245(-):Monkey
G1    GrA.2:42845-428289(+):Monkey
G1    QYEH897EH.3
G1    GrA2S2_ED:42845-4282789(+):Cow
G1    UDDKDDH6
G1    YDDIJBDIB778
G2    Gr.1:423663-4282542(-):Elephant
G2    Gr7E:423609-4282552(+):Elephant
G2    UEHHEE88E8E.2
G2    AP_UUD1_CU_OK-lQGGQ
G2    GrEH:423663-4282542(+):Baboon
G2    Gr7JE:42356-428257(+):Snail
G2    AP_UUD1_CU_OK-lQ8900
G2    ASGSG_E553:423663-4282542(-):Human
G3    GrA98_OK:42845-42867(+):Bird
G3    AGGAGA5567

我保留G1 ,因為我們總共有element <= 3 (猴子、大象和牛)

我刪除了G2 ,因為我們的element > 3 (大象、人類、蝸牛和狒狒)

我保留G3因為總共有element <= 3 (Bird)

正如你所看到的,我們為包含'):'

並且預期的 output 將是:

name  value
G1    Gr.1:4282399-4282564(+):Elephant
G1    SEQAHAHHE
G1    Zr.2:4282387-428245(-):Monkey
G1    GrA.2:42845-428289(+):Monkey
G1    QYEH897EH.3
G1    GrA2S2_ED:42845-4282789(+):Cow
G1    UDDKDDH6
G1    YDDIJBDIB778
G3    GrA98_OK:42845-42867(+):Bird
G3    AGGAGA5567

謝謝你的幫助

您可以使用.str.extract提取元素,然后使用groupby().nunique()來計算唯一元素的數量:

s = (df['value'].str.extract('({})'.format('|'.join(list_element)) )[0]
    .groupby(df['name'])
    .transform('nunique') )

df[s<=3]

Output:

   name                             value
0    G1  Gr.1:4282399-4282564(+):Elephant
1    G1                         SEQAHAHHE
2    G1     Zr.2:4282387-428245(-):Monkey
3    G1      GrA.2:42845-428289(+):Monkey
4    G1                       QYEH897EH.3
5    G1    GrA2S2_ED:42845-4282789(+):Cow
6    G1                          UDDKDDH6
7    G1                      YDDIJBDIB778
16   G3      GrA98_OK:42845-42867(+):Bird
17   G3                        AGGAGA5567
df = df.groupby('name').filter(lambda x: len(set(x[x['value'].str.contains(':')]['value'].str.split(':').str[-1].values)) <= 3)
print(df)

印刷:

   name                             value
0    G1  Gr.1:4282399-4282564(+):Elephant
1    G1                         SEQAHAHHE
2    G1     Zr.2:4282387-428245(-):Monkey
3    G1      GrA.2:42845-428289(+):Monkey
4    G1                       QYEH897EH.3
5    G1    GrA2S2_ED:42845-4282789(+):Cow
6    G1                          UDDKDDH6
7    G1                      YDDIJBDIB778
16   G3      GrA98_OK:42845-42867(+):Bird
17   G3                        AGGAGA5567

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM