根據多個列值輸出多個文件pandas python

Question

這個問題緊跟我之前的問題，它基於列值python pandas輸出多個文件，但是這次我想進一步介紹一下。

所以這次我有一個小的樣本數據集：

import pandas as pd

df = {'ID': ['H900','H901','H902','M1436','M1435','M149','M157','M213','M699','M920','M871','M789','M617','M991','H903','M730','M191'],
  'CloneID': [0,1,2,2,2,2,2,2,3,3,3,4,4,4,5,5,6],
  'Length': [48,42  ,48,48,48,48,48,48,48,48,48,48,48,48,48,48,48]}

df = pd.DataFrame(df)

看起來像：

df
Out[6]: 
    CloneID   ID  Length
0       0   H900      48
1       1   H901      42
2       2   H902      48
3       2   M1436     48
4       2   M1435     48
5       2   M149      48
6       2   M157      48
7       2   M213      48
8       3   M699      48
9       3   M920      48
10      3   M871      48
11      4   M789      48
12      4   M617      48
13      4   M991      48
14      5   H903      48
15      5   M730      48
16      6   M191      48

我想將每個“ cloneID”輸出到不同的輸出文件，但是這次僅包含那些以“ H”開頭的ID的文件。

所以我想要的輸出， 4個輸出文件 ：

第一個文件為“ cloneID0.txt”

    CloneID   ID  Length
      0      H900      48

第二個文件是“ CloneID1.txt”

    CloneID   ID  Length
      1      H901      42

第三個文件是“ CloneID2.txt”

    CloneID   ID  Length
       2     H902      48
       2     M1436     48
       2     M1435     48
       2     M149      48
       2     M157      48
       2     M213      48

第二個文件是“ CloneID5.txt”

    CloneID   ID  Length
      5     H903      48
      5     M730      48

因此不會有“ CloneID3.txt”，“ CloneID4.txt”和“ CloneID6.txt”，因為這些克隆沒有任何以“ H”開頭的ID。

我的代碼：

import pandas as pd
data = pd.read_csv('data.txt', sep = '\t')
gp = data.groupby('CloneID')
for g in gp.groups:
    for s in data.ID:
        if s.startswith("H"):
           path = 'IgHCloneID' + str(g) + '.xlsx'
           gp.get_group(g).to_excel(path, index=False)

它仍然提供了所有克隆文件，而不僅僅是包含以“ H”開頭的ID的文件。

Answer 1

你可以先filter通過條件any列值ID startswith “H”和最后一個groupby與to_csv ：

df1 = (df.groupby('CloneID').filter(lambda x: (x.ID.str.startswith("H").any())))

df1.groupby('CloneID').apply(lambda x: x.to_csv('CloneID{}.txt'.format(x.name), index=False))

Answer 2

您可以groupby CloneID和直接寫在到csv apply方法：

df.groupby('CloneID').apply(lambda gp: gp.to_csv('CloneID{}.txt'.format(gp.name)))

這將保留原始索引，但可以在to_csv調用之前通過.set_index('CloneID')進行to_csv 。

編輯：僅保留相應ID以H開頭的組：

這需要對每個組進行檢查； 這是一種方法：

df.groupby('CloneID').apply(
    lambda gp: gp.to_csv('CloneID{}.txt'.format(gp.name))
    if any(gp.ID.str.startswith('H'))
    else None)

Answer 3

創建要迭代的克隆ID的列表，然后將數據框過濾到ID字符串的第一個值為H的克隆ID，然后輸出為文本。

碼

import pandas as pd

df = {'ID': ['H900','H901','H902','M1436','M1435','M149','M157','M213','M699','M920','M871','M789','M617','M991','H903','M730','M191'],
  'CloneID': [0,1,2,2,2,2,2,2,3,3,3,4,4,4,5,5,6],
  'Length': [48,42  ,48,48,48,48,48,48,48,48,48,48,48,48,48,48,48]}

df = pd.DataFrame(df)

clone_list = df['CloneID'].drop_duplicates().values.tolist()

for c in clone_list:
    clone_df = df.loc[df['CloneID'] == c]
    clone_df = clone_df.loc[(clone_df['ID'].str[0] == 'H') | (clone_df['ID'].str[0] == 'M')]
    #Create your text file here
    print clone_df

結果

    CloneID    ID  Length
0        0  H900      48
   CloneID    ID  Length
1        1  H901      42
   CloneID     ID  Length
2        2   H902      48
3        2  M1436      48
4        2  M1435      48
5        2   M149      48
6        2   M157      48
7        2   M213      48
    CloneID    ID  Length
8         3  M699      48
9         3  M920      48
10        3  M871      48
    CloneID    ID  Length
11        4  M789      48
12        4  M617      48
13        4  M991      48
    CloneID    ID  Length
14        5  H903      48
15        5  M730      48
    CloneID    ID  Length
16        6  M191      48

根據多個列值輸出多個文件pandas python

問題描述

3 個解決方案

解決方案1
3 已采納 2016-05-20 14:26:41

解決方案2
0 2016-05-20 14:15:24

解決方案3
-1 2016-05-20 14:00:49

根據多個列值輸出多個文件pandas python

問題描述

3 個解決方案

解決方案1 3 已采納 2016-05-20 14:26:41

解決方案2 0 2016-05-20 14:15:24

解決方案3 -1 2016-05-20 14:00:49

解決方案1
3 已采納 2016-05-20 14:26:41

解決方案2
0 2016-05-20 14:15:24

解決方案3
-1 2016-05-20 14:00:49