在 python 中对文本文件中的内容进行分组

Question

I have an input file of the following format我有以下格式的输入文件

CC   -----------------------------------------------------------------------
CC
CC   hgfsdh kjhsdt kjshdk
CC
CC   -----------------------------------------------------------------------
CC   Release of 18-Sep-2019
CC   -----------------------------------------------------------------------
CC
CC   Alex 
CC   -----------------------------------------------------------------------
CC   Copyrighted vvvncbm License
CC   -----------------------------------------------------------------------
//
ID   1.1
ED   text1
AN   text2.
CA   text3
CF   text4.
CC   -!- Some members 
CC       also on .
CC   -!- May be 
CC   -!- Re
PR   PRTSF; C000AS61;
DQ   Q6, AZW2_DANRE;  Q7, AZW2_DANRE;  Q97, AZW2_DONT;
DQ   Q8, AZW2_DONT;  Q9, AZW2_AZW2_DONT;  Q10, AZW2_CAFT;
//
ID   1.2
ED   text1
AN   text2.
CA   text3
CF   text4.
CC   -!- Some members 
CC       also on .
CC   -!- May be 
CC       second line
PR   PRTSF; DOC00;
DQ   Q6, AZW2_DANRE;  Q7, AZW2_DANRE;  Q97, AZW2_DONT;
DQ   Q8, AZW2_DONT;  Q9, AZW2_AZW2_DONT;  Q10, AZW2_CAFT;
DQ   Q15, AZW2_DANRE;  Q43, AZW2_DANRE;  Q049, AZW2_DONT;
//

I would like to group the data in this text file and store it in a json我想将此文本文件中的数据分组并将其存储在 json

I've tried the following,我试过以下，

import os
import json
from pprint import pprint

def text_to_json(f_input):
    location_data = []
    if os.path.exists(f_input):
        with open(f_input, 'r') as f:
            for line in f.readlines()[12:]:
                if line.strip() != '//' and line.strip() != '//' and line.strip():
                    print(line[:-1])

                pass
        # return json.dumps(data)


if __name__ == '__main__':
    f_input = 'input.txt'
    text_to_json(f_input)

I have skipped the first few lines with comments.我跳过了前几行的评论。 if line.strip().= 'DELIMITER' and line.strip():= 'DELIMITER' and line.strip(): , the delimiter is // . if line.strip().= 'DELIMITER' and line.strip():= 'DELIMITER' and line.strip(): ，分隔符是// 。 However, I am not sure how to use \\ and group the data corresponding to each id.但是，我不确定如何使用\\并将每个 id 对应的数据分组。

I would like to group the data using delimiter and store the data of each id in json format.我想使用分隔符对数据进行分组，并以 json 格式存储每个 id 的数据。

{
'1.1' : 
{'DQ': {'Q6': AZW2_DANRE,  'Q7': 'AZW2_DANRE',  'Q97': 'AZW2_DONT'
'Q8': 'AZW2_DONT',  'Q9': 'AZW2_AZW2_DONT';  'Q10': 'AZW2_CAFT'},
'ED': 'text1',
'AN': 'text2.',
'CA': 'text3',
'CF': 'text4.',
'PR': 'PRTSF; C000AS61;',
'CC': ['Some members also on .', 'May be', 'Re']
 } 
'1.2' :
{'DQ': {'Q6': AZW2_DANRE,  'Q7': 'AZW2_DANRE',  'Q97': 'AZW2_DONT'
'Q8': 'AZW2_DONT',  'Q9': 'AZW2_AZW2_DONT';  'Q10': 'AZW2_CAFT',
'Q15': 'AZW2_DANRE',  'Q43': 'AZW2_DANRE',  'Q049': 'AZW2_DONT'},
'ED': 'text1',
'AN': 'text2.',
'CA': 'text3',
'CF': 'text4.',
'PR': 'PRTSF; DOC00;',
'CC': ['Some members also on .', 'May be second line']
}
}

I could create the above json by storing based on line numbers.我可以通过基于行号存储来创建上述 json。 However, the line number of each dataset keeps varying.但是，每个数据集的行号不断变化。 For instance, the data stored against DQ has 2 lines in the first dataset and 3 in second.例如，针对DQ存储的数据在第一个数据集中有 2 行，在第二个数据集中有 3 行。 Any suggestions on how to proceed?关于如何进行的任何建议？

Answer 1

I would suggest taking the approach of constructing everything in memory, in dictionaries and arrays.我建议采用在 memory、字典和 arrays 中构建所有内容的方法。 In the code below everything is being accumulated into the d dictionary.在下面的代码中，所有内容都被累积到d字典中。 And then dump the data from memory as a JSON object.然后将 memory 中的数据转储为 JSON object。 It looks like you want to treat different types of lines differently ('CC' lines become an array, 'DQ' lines become a dictionary, and other lines just stored).看起来你想以不同的方式处理不同类型的行（'CC' 行变成一个数组，'DQ' 行变成一个字典，其他行只是存储）。 So, here's how I would approach the code:所以，这就是我将如何处理代码：

import os
import json
from pprint import pprint

def text_to_json(f_input):
    location_data = []
    if os.path.exists(f_input):
        with open(f_input, 'r') as f:

            # Accumulate all of the line data in this dictionary
            d = {}

            # This keeps track of the current ID, like 1.1 or 1.2
            current_line_id = ''

            for line in f.readlines()[12:]:
                if line.strip() != '//' and line.strip() != '//' and line.strip():
                    # print(line[:-1])
                    line_type = line[0:2]
                    line_data = line[5:-1]
                    if line_type == 'ID':
                        d[line_data] = dict()
                        current_line_id = line_data
                    elif line_type == 'CC':
                        if line_type not in d[current_line_id]:
                            d[current_line_id][line_type] = []
                        d[current_line_id][line_type].append(line_data)
                    elif line_type == 'DQ':
                        if line_type not in d[current_line_id]:
                            d[current_line_id][line_type] = {}
                        for dq in line_data.split(';'):
                            dq = dq.strip()
                            dq_key = dq[0:2]
                            dq_val = dq[4:]
                            if dq_key != '':
                                d[current_line_id][line_type][dq_key] = dq_val
                    else:
                        d[current_line_id][line_type] = line_data

                pass
            print(json.dumps(d, indent=2))
        # return json.dumps(data)


if __name__ == '__main__':
    f_input = 'input.txt'
    text_to_json(f_input)

在 python 中对文本文件中的内容进行分组

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-09-22 03:38:39

在 python 中对文本文件中的内容进行分组

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-09-22 03:38:39

解决方案1
0 已采纳 2019-09-22 03:38:39