簡體   English   中英

我如何比較python中兩行不同的兩列

[英]How can i compare two columns in two different rows in python

我想遍歷 csv 文件的每一行並進行比較以查看第 1 行的第一個字段是否與下一行的第一個字段相同,依此類推。 如果找到匹配項,那么我想忽略包含相同字段的那兩行,並保留沒有匹配項的行

這是一個示例數據集 (no_dup.txt)

Ac_Gene_ID  M_Gene_ID
ENSGMOG00000015632  ENSORLG00000010573
ENSGMOG00000015632  ENSORLG00000010585
ENSGMOG00000003747  ENSORLG00000006947
ENSGMOG00000003748  ENSORLG00000004636

基本上我想排除第 1 行和第 2 行,因為它們包含相同的字段(ENSGMOG00000015632)並保留第 3 行和第 4 行

這是我嘗試過但無法完成的代碼

prev = None

with open("no_dup.txt", 'r') as fh_in:
    for line in fh_in:
        line = line.strip()
        if line.startswith("E"):
            line1 = line.split()
            print "initial gene =", line1[0]
            if prev is not None or prev!= line1[0]:
                prev = line1[0]

我認為這樣做的一種干凈的方法是制作每個條目的地圖 - > 行列表。

entries = {}
with open('no_dup.txt', 'r') as fh_in:
    for line in fg_in:
        entry = line.split()[0]
        if entry in entries:
            entries[entry].append(line)
        else:
            entries[entry] = [line]
for matches in entries.iteritems():
    if len(matches) == 1:
        print matches[0]

您應該注意,這不會保留條目的順序。

你的開始看起來不錯:

def filter_dups(iterable):
      prev = None
      for line in iterable:
          if line.startswith("E"):
              if prev.split(None, 1)[0] == line.split(None, 1)[0]:
                  prev = None
              else:
                  if prev is not None:
                      yield prev
                  else:
                      prev = line
          else:
              yield line
              prev = None
      if prev is not None:
          yield prev

  with open("no_dup.txt", 'r') as fh_in:
      with open("no_dup_out.txt", 'r') as fh_out:
          fh_out.writelines(filter_dups(fh_in))

你可以使用這個:

with open('a.txt','r') as inputFile:
   lines = inputFile.readlines()

prev = lines[0]

for i in range(1, len(lines)):
   cur = lines[i]
   if prev.split()[0] != cur.split()[0]:
      print prev.strip()
   prev = cur

print lines[-1].strip()

輸入:

ENSGMOG00000015632  ENSORLG00000010573
ENSGMOG00000015632  ENSORLG00000010585
ENSGMOG00000003747  ENSORLG00000006947
ENSGMOG00000003748  ENSORLG00000004636

輸出:

ENSGMOG00000015632  ENSORLG00000010585
ENSGMOG00000003747  ENSORLG00000006947
ENSGMOG00000003748  ENSORLG00000004636

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM