![](/img/trans.png)
[英]How to compare two values in different rows AND different columns in pandas
[英]How can i compare two columns in two different rows in python
我想遍歷 csv 文件的每一行並進行比較以查看第 1 行的第一個字段是否與下一行的第一個字段相同,依此類推。 如果找到匹配項,那么我想忽略包含相同字段的那兩行,並保留沒有匹配項的行
這是一個示例數據集 (no_dup.txt)
Ac_Gene_ID M_Gene_ID
ENSGMOG00000015632 ENSORLG00000010573
ENSGMOG00000015632 ENSORLG00000010585
ENSGMOG00000003747 ENSORLG00000006947
ENSGMOG00000003748 ENSORLG00000004636
基本上我想排除第 1 行和第 2 行,因為它們包含相同的字段(ENSGMOG00000015632)並保留第 3 行和第 4 行
這是我嘗試過但無法完成的代碼
prev = None
with open("no_dup.txt", 'r') as fh_in:
for line in fh_in:
line = line.strip()
if line.startswith("E"):
line1 = line.split()
print "initial gene =", line1[0]
if prev is not None or prev!= line1[0]:
prev = line1[0]
我認為這樣做的一種干凈的方法是制作每個條目的地圖 - > 行列表。
entries = {}
with open('no_dup.txt', 'r') as fh_in:
for line in fg_in:
entry = line.split()[0]
if entry in entries:
entries[entry].append(line)
else:
entries[entry] = [line]
for matches in entries.iteritems():
if len(matches) == 1:
print matches[0]
您應該注意,這不會保留條目的順序。
你的開始看起來不錯:
def filter_dups(iterable):
prev = None
for line in iterable:
if line.startswith("E"):
if prev.split(None, 1)[0] == line.split(None, 1)[0]:
prev = None
else:
if prev is not None:
yield prev
else:
prev = line
else:
yield line
prev = None
if prev is not None:
yield prev
with open("no_dup.txt", 'r') as fh_in:
with open("no_dup_out.txt", 'r') as fh_out:
fh_out.writelines(filter_dups(fh_in))
你可以使用這個:
with open('a.txt','r') as inputFile:
lines = inputFile.readlines()
prev = lines[0]
for i in range(1, len(lines)):
cur = lines[i]
if prev.split()[0] != cur.split()[0]:
print prev.strip()
prev = cur
print lines[-1].strip()
輸入:
ENSGMOG00000015632 ENSORLG00000010573
ENSGMOG00000015632 ENSORLG00000010585
ENSGMOG00000003747 ENSORLG00000006947
ENSGMOG00000003748 ENSORLG00000004636
輸出:
ENSGMOG00000015632 ENSORLG00000010585
ENSGMOG00000003747 ENSORLG00000006947
ENSGMOG00000003748 ENSORLG00000004636
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.