![](/img/trans.png)
[英]How do I compare a row value in one column to all other rows in a different column within a group?
[英]Multiple rows share a value in a column, how do I put all of these rows into one single row?
我正在使用一個看起來像這樣的文本文件:
rs001 EEE /n rs008 EEE /n rs345 EEE /n rs542 CHG /n re432 CHG /n
我希望能夠將第2列中共享相同值的所有行折疊到單個行中(例如rs001 rs008 rs345 EEE
)。 是否有使用Unix文本處理或python的簡單方法?
謝謝
#!/usr/bin/env python
from __future__ import with_statement
from itertools import groupby
with open('file','r') as f:
# We define "it" to be an iterator, for each line
# it yields pairs like ('rs001','EEE')
it=(line.strip().split() for line in f)
# groupby does the heave work.
# lambda p: p[1] is the keyfunction. It groups pairs according to the
# second element, e.g. 'EEE'
for key,group in groupby(it,lambda p: p[1]):
# group might be something like [('rs001','EEE'),('rs008','EEE'),...]
# key would be something like 'EEE', the value that we're grouping by.
print('%s %s'%(' '.join([p[0] for p in group]),key))
一種選擇是建立以第2列數據為鍵的字典:
from collections import defaultdict #defaultdict will save a line or two of code
d = defaultdict(list) # goal is for d to look like {'EEE':['rs001', 'rs008', ...
for line in file('data.txt', 'r'):
v, k = line.strip().split()
d[k].append(v)
for k, v in d.iteritems(): # print d as the strings you want
print ' '.join(v+[k])
這種方法的優點是不需要將第2列的術語歸為一組(盡管在問題中未直接指定第2列是否為預先分組的)。
這是給你的傻瓜
$ awk '{a[$2]=a[$2]FS$1}END{for(i in a)print i,a[i]}' file
EEE rs001 rs008 rs345
CHG rs542 re432
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.