简体   繁体   中英

how to group a text file by first 3 characters of lines?

I have a list with three columns:ID,longitude,latitude:


A part of my text file:

AFJ.SPZ.IR.8    46.84   38.463
AKL.SPZ.IR.11   46.691  38.399
AKL.SPZ.IR.12   46.722  38.407
AFJ.SPZ.IR.3    46.812  38.433
AFJ.SPZ.IR.8    46.84   38.463
AKL.SPZ.IR.11   46.691  38.399
AKL.SPZ.IR.12   46.722  38.407
AKL.SPZ.IR.13   46.654  38.404
AKL.SPZ.IR.25   46.699  38.442
AKL.SPZ.IR.3    46.812  38.433
AKL.SPZ.IR.8    46.84   38.463
ALA.SPZ.IR.3    46.812  38.433
ANAR.BHZ.IR.8   46.84   38.463
ANJ.SPZ.IR.13   46.654  38.404
ANJ.SPZ.IR.18   46.662  38.399
ANJ.SPZ.IR.3    46.812  38.433
BST.SPZ.IR.1    46.732  38.457
BST.SPZ.IR.10   46.707  38.448
BST.SPZ.IR.11   46.691  38.399
BST.SPZ.IR.12   46.722  38.407

I want to execute a function to lon and lat of the ids which have the same first3characters. my code:

from itertools import groupby

with open('coors1.txt') as fin:
    lines = (line.split() for line in fin)
    for l in lines:
        a=l[0].split(".")
        for key, items in groupby(l,a):
            print (items)

    TypeError: 'str' object is not callable

why doesn't groupby understands the str?

  1. You need to sort the data before applying groupby
  2. You need to specify the key as a function, not a string

     from itertools import groupby with open('Input.txt') as fin: lines = sorted([line.rstrip() for line in fin]) for item, grp in groupby(lines, key = lambda x:x.split(".")[0]): print item, list(grp) 

Output

AFJ ['AFJ.SPZ.IR.3    46.812  38.433', 'AFJ.SPZ.IR.8    46.84   38.463', 'AFJ.SPZ.IR.8    46.84   38.463']
AKL ['AKL.SPZ.IR.11   46.691  38.399', 'AKL.SPZ.IR.11   46.691  38.399', 'AKL.SPZ.IR.12   46.722  38.407', 'AKL.SPZ.IR.12   46.722  38.407', 'AKL.SPZ.IR.13   46.654  38.404', 'AKL.SPZ.IR.25   46.699  38.442', 'AKL.SPZ.IR.3    46.812  38.433', 'AKL.SPZ.IR.8    46.84   38.463']
ALA ['ALA.SPZ.IR.3    46.812  38.433']
ANAR ['ANAR.BHZ.IR.8   46.84   38.463']
ANJ ['ANJ.SPZ.IR.13   46.654  38.404', 'ANJ.SPZ.IR.18   46.662  38.399', 'ANJ.SPZ.IR.3    46.812  38.433']
BST ['BST.SPZ.IR.1    46.732  38.457', 'BST.SPZ.IR.10   46.707  38.448', 'BST.SPZ.IR.11   46.691  38.399', 'BST.SPZ.IR.12   46.722  38.407']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM