[英]Iteratively add new string as element in list
我正在使用一個數據框,該數據框由一列數字組成,格式為: [[45, 45, 'D'],[46, 49, 'C'],[50, 66, 'S'],[67, 101, 'C'],[102, 103, 'S'],[104, 106, 'C'],[107, 108, 'S'],[109, 120, 'C'],[121, 121, 'S'],[122, 123, 'C'],[124, 140, 'S'],[141, 149, 'C'],[150, 176, 'S'],[177, 178, 'C'],[179, 181, 'S'],[182, 194, 'C'],[195, 213, 'S'],[214, 21``7, 'C']]
These numbers correspond to the positions of characters in a string: ie the string: 'MGILSFLPVLATESDWADCKSPQPWGHMLLWTAVLFLAPVAGTPAAPPKAVLKLEPQWINVLQEDSVTLTCRGTHSPESDSIQWFHNGNLIPTHTQPSYRFKANNNDSGEYTCQTGQTSLSDPVHLTVLSEWLVLQTPHLEFQEGETIVLRCHSWKDKPLVKVTFFQNGKSKKFSRSDPNFSIPQANHSHSGDYHCTGNIGYTLYSSKPVTITVQAPSSSPMGIIVAVVTGIAVAAIVAAVVALIYCRKKRISALPGYPECREMGETLPEKPANPTNPDEADKVGAENTITYSLLMHPDALEEPDDQNRI'
如您所見,列表中的某些字符與數字列表中的數字不對應(即缺少0-44)。 因此,必須刪除第 0-44 個 position 的字符以創建更短的字母序列。
我可以為一行執行此操作,但我正在努力為數據框中的每一行執行此操作。
這是為一行做的代碼:
new_s = ''
for item in res:
new_s += strSeq[item[0]-1:item[1]]
print(len(new_s), new_s)
這就是我一直試圖為所有線路獲取它的內容:
shortenedSeq_list =[]
counter=0
stringstring=[]
for rows in df.itertuples():
strSeq2 = [rows.sequence]
strremove2 = [rows.shortened_mobidb_consensus]
for item in strremove2:
res = ast.literal_eval(item)
for item in res:
stringstring.append(strSeq2[item[0]-1:item[1]])
stringstring
但這會導致 output:
[],
[],
[],
[],
[],
[],
[],
[],
[],
['MGKGKPRGLNSARKLRVHRRNNRWAETTYKKRLLGTAFKSSPFGGSSHAKGIVLEKIGIESKQPNSAIRKCVRVQLIKNGKKVTAFVPNDGCLNFVDENDEVLLAGFGRKGKAKGDIPGVRFKVVKVSGVSLLALWKEKKEKPRS'],
[],
[],
而我希望列表中的每一行都是縮短的序列。
我最終想將此列表添加為 dataframe 中的一列。
更新
數字輸出為字符串而不是列表,因此 res 是將數字作為列表輸出,這是工作代碼 output:
173 AAPPKAVLKLEPQWINVLQEDSVTLTCRGTHSPESDSIQWFHNGNLIPTHTQPSYRFKANNNDSGEYTCQTGQTSLSDPVHLTVLSEWLVLQTPHLEFQEGETIVLRCHSWKDKPLVKVTFFQNGKSKKFSRSDPNFSIPQANHSHSGDYHCTGNIGYTLYSSKPVTITVQAP
其中173是縮短序列的長度,后跟序列。
df樣本:
shortened_mobidb_consensus;sequence
[[45, 45, 'D'], [46, 49, 'C'], [50, 66, 'S'], [67, 101, 'C'], [102, 103, 'S'], [104, 106, 'C'], [107, 108, 'S'], [109, 120, 'C'], [121, 121, 'S'], [122, 123, 'C'], [124, 140, 'S'], [141, 149, 'C'], [150, 176, 'S'], [177, 178, 'C'], [179, 181, 'S'], [182, 194, 'C'], [195, 213, 'S'], [214, 217, 'C']];MGILSFLPVLATESDWADCKSPQPWGHMLLWTAVLFLAPVAGTPAAPPKAVLKLEPQWINVLQEDSVTLTCRGTHSPESDSIQWFHNGNLIPTHTQPSYRFKANNNDSGEYTCQTGQTSLSDPVHLTVLSEWLVLQTPHLEFQEGETIVLRCHSWKDKPLVKVTFFQNGKSKKFSRSDPNFSIPQANHSHSGDYHCTGNIGYTLYSSKPVTITVQAPSSSPMGIIVAVVTGIAVAAIVAAVVALIYCRKKRISALPGYPECREMGETLPEKPANPTNPDEADKVGAENTITYSLLMHPDALEEPDDQNRI
[[1, 1, 'D'], [2, 143, 'S'], [144, 145, 'C']];MGKGKPRGLNSARKLRVHRRNNRWAETTYKKRLLGTAFKSSPFGGSSHAKGIVLEKIGIESKQPNSAIRKCVRVQLIKNGKKVTAFVPNDGCLNFVDENDEVLLAGFGRKGKAKGDIPGVRFKVVKVSGVSLLALWKEKKEKPRS
[[1, 145, 'S']];MGKGKPRGLNSARKLRVHRRNNRWAETTYKKRLLGTAFKSSPFGGSSHAKGIVLEKIGIESKQPNSAIRKCVRVQLIKNGKKVTAFVPNDGCLNFVDENDEVLLAGFGRKGKAKGDIPGVRFKVVKVSGVSLLALWKEKKEKPRS
[[1, 1, 'D'], [2, 2, 'C'], [3, 37, 'S'], [38, 39, 'C'], [40, 40, 'S'], [41, 41, 'C'], [42, 62, 'S'], [63, 65, 'C'], [66, 231, 'S']];MSKNILVLGGSGALGAEVVKFFKSKSWNTISIDFRENPNADHSFTIKDSGEEEIKSVIEKINSKSIKVDTFVCAAGGWSGGNASSDEFLKSVKGMIDMNLYSAFASAHIGAKLLNQGGLFVLTGASAALNRTSGMIAYGATKAATHHIIKDLASENGGLPAGSTSLGILPVTLDTPTNRKYMSDANFDDWTPLSEVAEKLFEWSTNSDSRPTNGSLVKFETKSKVTTWTNL
[[24, 29, 'D'], [30, 91, 'S'], [92, 92, 'D']];MKVSTTALAVLLCTMTLCNQVFSAPYGADTPTACCFSYSRKIPRQFIVDYFETSSLCSQPGVIFLTKRNRQICADSKETWVQEYITDLELNA
df = pd.read_csv('stringsample.txt',sep=';',converters={0:ast.literal_eval})
for index, row in df.iterrows():
new_s = ''
res = row.shortened_mobidb_consensus
for item in res:
new_s += row.sequence[item[0]-1:item[1]]
df.loc[index,'output'] = new_s
df['output']
0 AAPPKAVLKLEPQWINVLQEDSVTLTCRGTHSPESDSIQWFHNGNL...
1 MGKGKPRGLNSARKLRVHRRNNRWAETTYKKRLLGTAFKSSPFGGS...
2 MGKGKPRGLNSARKLRVHRRNNRWAETTYKKRLLGTAFKSSPFGGS...
3 MSKNILVLGGSGALGAEVVKFFKSKSWNTISIDFRENPNADHSFTI...
4 APYGADTPTACCFSYSRKIPRQFIVDYFETSSLCSQPGVIFLTKRN...
Name: output, dtype: object
df = pd.read_csv('stringsample.txt',sep=';')
shortenedSeq_list =[]
counter=0
stringstring=[]
for rows in df.itertuples():
strSeq2 = rows.sequence
strremove2 = rows.shortened_mobidb_consensus
res = ast.literal_eval(strremove2)
new_s = ''
for item in res:
new_s += strSeq2[item[0]-1:item[1]]
stringstring.append(new_s)
stringstring
['AAPPKAVLKLEPQWINVLQEDSVTLTCRGTHSPESDSIQWFHNGNLIPTHTQPSYRFKANNNDSGEYTCQTGQTSLSDPVHLTVLSEWLVLQTPHLEFQEGETIVLRCHSWKDKPLVKVTFFQNGKSKKFSRSDPNFSIPQANHSHSGDYHCTGNIGYTLYSSKPVTITVQAP',
'MGKGKPRGLNSARKLRVHRRNNRWAETTYKKRLLGTAFKSSPFGGSSHAKGIVLEKIGIESKQPNSAIRKCVRVQLIKNGKKVTAFVPNDGCLNFVDENDEVLLAGFGRKGKAKGDIPGVRFKVVKVSGVSLLALWKEKKEKPRS',
'MGKGKPRGLNSARKLRVHRRNNRWAETTYKKRLLGTAFKSSPFGGSSHAKGIVLEKIGIESKQPNSAIRKCVRVQLIKNGKKVTAFVPNDGCLNFVDENDEVLLAGFGRKGKAKGDIPGVRFKVVKVSGVSLLALWKEKKEKPRS',
'MSKNILVLGGSGALGAEVVKFFKSKSWNTISIDFRENPNADHSFTIKDSGEEEIKSVIEKINSKSIKVDTFVCAAGGWSGGNASSDEFLKSVKGMIDMNLYSAFASAHIGAKLLNQGGLFVLTGASAALNRTSGMIAYGATKAATHHIIKDLASENGGLPAGSTSLGILPVTLDTPTNRKYMSDANFDDWTPLSEVAEKLFEWSTNSDSRPTNGSLVKFETKSKVTTWTNL',
'APYGADTPTACCFSYSRKIPRQFIVDYFETSSLCSQPGVIFLTKRNRQICADSKETWVQEYITDLELNA']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.