我如何获取python中的特定字段

Question

I have two rows like below 我有两行如下

Tp1g00130_scaffold_1    blastn    exon    20495    20602    .    +    .    
Tp1g00130_scaffold_1    blastn    exon    20650    20804    .    +    .

What i want to do is to merge the seq start (column 3 of row 1) and seq end (column 4 of row 2) of two lines if they have the same ID(column 1). 我想做的是如果两行具有相同的ID（第1列），则合并两行的seq起始（第1列的第3列）和seq结束（第2列的第4列）。 For example, the output would look like 例如，输出看起来像

Tp1g00130_scaffold_1    blastn    exon    20495    20804    .    +    .

I made a good start but cannot quite finish. 我有一个良好的开端，但还不能完全结束。

prev = None

with open("test_parse") as fh_in:
    for line in fh_in:
        line = line.strip()
        line = line.split()
        line_id = line[0]
        print line
        if prev is not None and prev == line_id:
            print "yes"
        prev = line_id

Any help? 有什么帮助吗？

Answer 1

You're almost there. 你快到了。

Instead of prev being just the id , make it the whole last line. 不仅仅是prev是id ，而是使它成为最后一行。 This lets us check existance and id ( if prev and prev[0] == line[0]: ) and get the seq start and seq end ( print('{} -> {}'.format(prev[3], line[4])) ). 这使我们可以检查是否存在和id（ if prev and prev[0] == line[0]: ：）并获得seq开始和seq结束（ print('{} -> {}'.format(prev[3], line[4])) ）。

prev = None
with open("test_parse") as fh_in:
    for line in fh_in:
        line = line.strip().split()
        if prev and prev[0] == line[0]:
            print(' '.join(prev).replace(prev[4], line[4]).split())
        prev = line

Answer 2

If your file is small you can use a temporary dict. 如果文件很小，则可以使用临时字典。

records = {}

with open("test_parse") as fh_in:
    for line in fh_in:
        id_, f1, f2, start, end, f4, f5, f6 = line.strip().split()
        if id_ in records:
            records[id_][4] = end
        else:
            records[id_] = [id_, f1, f2, start, end, f4, f5, f6]

for line in records.values():
    print "\t".join(line)

Answer 3

If you have aa header row in your file you can use a DictReader . 如果文件中有一个标题行，则可以使用DictReader 。

For a file with headers for columns x, y, and z you can do: 对于标题为x，y和z列的文件，您可以执行以下操作：

import DictReader

reader = DictReader(open('sample.csv'))
for line in reader:
    print(line['x'], line['z'])

The csv module it is a part of is very helpful in general. 它的一部分csv模块通常非常有用。

我如何获取python中的特定字段

问题描述

3 个解决方案

解决方案1
1 已采纳 2015-03-12 21:33:23

解决方案2
1 2015-03-12 21:40:42

解决方案3
0 2015-03-12 21:50:55

我如何获取python中的特定字段

问题描述

3 个解决方案

解决方案1 1 已采纳 2015-03-12 21:33:23

解决方案2 1 2015-03-12 21:40:42

解决方案3 0 2015-03-12 21:50:55

解决方案1
1 已采纳 2015-03-12 21:33:23

解决方案2
1 2015-03-12 21:40:42

解决方案3
0 2015-03-12 21:50:55