简体   繁体   中英

Python - read a file and save data to tuple

I would like to read a specific txt file and get data from it and write it to tuple. The problem is I dont need all data from file, just specific ones. So the text file looks like this:

HHSDMSDN1-pool                           1.02T   141G     39     22  2.62M   940K
  **c5t600507680C800000001CBd0   834G   118G**     32     16  2.19M   734K
  **c5t600507680C00352d0   216G  22.3G**      7      5   434K   206K


HHSDMSDN2-pool                           1.09T   308G     12      6   744K  83.8K
  **c5t600507680C800001CDd0   790G   162G**     10      1   617K  12.5K
  **c5t600507680C8000000037Dd0   203G  34.8G**      1      0   123K  10.2K
  **c5t600507680C800000387d0   126G   112G**      0      5  5.36K  80.5K

HHSDMSDN3-pool                           1.13T  33.4G     24     19  1.39M   623K
  **c5t600507680C80002E6000001CFd0   921G  30.8G**     18     11  1.10M   465K
  **c5t600507680C80002E600000203d0   235G  2.63G**      5      8   293K   158K

Bold text need to go into tuple. Best if first value would be string and next two double/float.

so the output will be

((c5t600507680C800000001CBd0, 834, 118), (c5t600507680C00352d0, 216, 22.3), .....))

Any ideas?

You simply have to iterate over the file line by line and keep track of what you already have seen.

Edit: new solution as requested

import pprint

data = """HHSDMSDN1-pool                           1.02T   141G     39     22  2.62M   940K
  c5t600507680C800000001CBd0   834G   118G     32     16  2.19M   734K
  c5t600507680C00352d0   216G  22.3G      7      5   434K   206K


HHSDMSDN2-pool                           1.09T   308G     12      6   744K  83.8K
  c5t600507680C800001CDd0   790G   162G     10      1   617K  12.5K
  c5t600507680C8000000037Dd0   203G  34.8G      1      0   123K  10.2K
  c5t600507680C800000387d0   126G   112G      0      5  5.36K  80.5K

HHSDMSDN3-pool                           1.13T  33.4G     24     19  1.39M   623K
  c5t600507680C80002E6000001CFd0   921G  30.8G     18     11  1.10M   465K
  c5t600507680C80002E600000203d0   235G  2.63G      5      8   293K   158K"""

# collect all records by key
d = {}

# current key "HHSDM..."
k = None

# current records
r = []

for line in data.splitlines():
    if line.startswith("  c"):
        # this is a record, append it to the current collection of records
        fields = line.split()
        r.append((fields[0], fields[1], fields[2]))
    elif line.startswith("H"):
        # this is a key, rember it, we will need it later
        k = line.split("-")[0]
    elif k:
        # this is an empty line and we have a key, store the records
        # and reset current records and current key
        d[k] = r
        r = []
        k = None

# append current records at the end of the input
d[k] = r

pprint.pprint(d)

Output:

{'HHSDMSDN1': [('c5t600507680C800000001CBd0', '834G', '118G'),
               ('c5t600507680C00352d0', '216G', '22.3G')],
 'HHSDMSDN2': [('c5t600507680C800001CDd0', '790G', '162G'),
               ('c5t600507680C8000000037Dd0', '203G', '34.8G'),
               ('c5t600507680C800000387d0', '126G', '112G')],
 'HHSDMSDN3': [('c5t600507680C80002E6000001CFd0', '921G', '30.8G'),
               ('c5t600507680C80002E600000203d0', '235G', '2.63G')]}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM