繁体   English   中英

使用python在文件中合并具有相同键值对的json对象

[英]Merge the json objects with same key value pair in a file using python

我有一个包含对象的文件,如下所示。

例如:Input.txt

1. {"Cp": "1000", "Af": "CBS", "Bp": "150", "Vt": "channel", "Ti": "Q2", "Cs": "K11HE-D", "Tg": "BROADCAST<>LOCAL<>HD", "Fd": "dish#K11HE-D", "Pi": "CHAF2", "Gi": "RV1688668060"}

2. {"Cp": "1000", "Af": "CBS", "Bp": "150", "Vt": "channel", "Ti": "Q2", "Cs": "K08JV-D", "Tg": "BROADCAST<>LOCAL<>HD", "Fd": "dish#K08JV-D", "Pi": "CHAF2", "Gi": "RV1714277379"}

3. {"Cp": "1000", "Af": "CBS", "Bp": "150", "Vt": "channel", "Ti": "ABCD", "Cs": "K20LT-D", "Tg": "BROADCAST<>LOCAL<>HD", "Fd": "dish#K20LT-D", "Pi": "CHAF2", "Gi": "RV1714278093"}

4. {"Cp": "1000", "Af": "CBS", "Bp": "150", "Vt": "channel", "Ti": "Q2", "Cs": "K08OW-D", "Tg": "BROADCAST<>LOCAL<>HD", "Fd": "dish#K08OW-D", "Pi": "CHAF2", "Gi": "RV1714277380"}

该文件包含数千行。

我想将文件中的所有这些 json 对象分组,这些对象对于键“ Ti ”具有相同的值。

下面是一个更详细地说明我的要求的例子。

您可以从上面的示例文件中看到,对于键“Ti”,有 3 行具有相同的值。 那是第 1、2 和 4 行。它们将“Ti”的所有值都作为“Q2”。

我需要一种方法来连接这些 JSON 对象,并且我想创建一个输出文件,如下所示。

例如:输出.txt

1. {"Cp": "[1000, 1000, 1000]", "Af": "['CBS', 'CBS', 'CBS']", "Bp": "[150, 150, 150]", "Vt": "['channel', 'channel', 'channel']", "Ti": "['Q2', 'Q2', 'Q2']", "Cs": "['K11HE-D', 'K08JV-D', 'K08OW-D' ]", "Tg": "['BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD, 'BROADCAST<>LOCAL<>HD]", "Fd": "['dish#K11HE-D', 'dish#K08JV-D', 'dish#K08OW-D']", "Pi": "['CHAF2','CHAF2','CHAF2']", "Gi": "['RV1688668060', 'RV1714277379', 'RV1714277380']"}

2. {"Cp": "[1000, 1000, 1000]", "Af": "['CBS', 'CBS', 'CBS']", "Bp": "[150, 150, 150]", "Vt": "['channel', 'channel', 'channel']", "Ti": "['Q2', 'Q2', 'Q2']", "Cs": "['K11HE-D', 'K08JV-D', 'K08OW-D' ]", "Tg": "['BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD, 'BROADCAST<>LOCAL<>HD]", "Fd": "['dish#K11HE-D', 'dish#K08JV-D', 'dish#K08OW-D']", "Pi": "['CHAF2','CHAF2','CHAF2']", "Gi": "['RV1688668060', 'RV1714277379', 'RV1714277380']"}

3. {"Cp": "1000", "Af": "CBS", "Bp": "150", "Vt": "channel", "Ti": "ABCD", "Cs": "K20LT-D", "Tg": "BROADCAST<>LOCAL<>HD", "Fd": "dish#K20LT-D", "Pi": "CHAF2", "Gi": "RV1714278093"}

4. {"Cp": "[1000, 1000, 1000]", "Af": "['CBS', 'CBS', 'CBS']", "Bp": "[150, 150, 150]", "Vt": "['channel', 'channel', 'channel']", "Ti": "['Q2', 'Q2', 'Q2']", "Cs": "['K11HE-D', 'K08JV-D', 'K08OW-D' ]", "Tg": "['BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD, 'BROADCAST<>LOCAL<>HD]", "Fd": "['dish#K11HE-D', 'dish#K08JV-D', 'dish#K08OW-D']", "Pi": "['CHAF2','CHAF2','CHAF2']", "Gi": "['RV1688668060', 'RV1714277379', 'RV1714277380']"}

请让我知道,我怎样才能做到这一点。

你需要:

  1. 将字符串转换为字典
  2. 收集 Ti 值
  3. 循环字典元素并基于 Ti 收集数据
import re

raw_data = open('test.txt', 'r')

data_list = raw_data.read().splitlines()
data_list = list(filter(None, data_list))

# create list of Ti values
ti_list = []
for item in data_list:
    number = re.search('\d+\.', item).group(0)
    row = re.sub('\d+\. ', '', item)
    row_dictionary = eval(row)
    ti_list.append(row_dictionary.get('Ti'))


# collect data into new dictionary
data = {}
i = 1
for ti in ti_list:
    raw = {}
    for item in data_list:
        number = re.search('\d+\.', item).group(0)
        row = re.sub('\d+\. ', '', item)
        row_dictionary = eval(row)

        if row_dictionary.get('Ti') == ti:
            for key, value in row_dictionary.items():
                raw.setdefault(key, []).append(value)

    data[str(i)+'.'] = raw
    i += 1

输出:

1. {'Cp': ['1000', '1000', '1000'], 'Af': ['CBS', 'CBS', 'CBS'], 'Bp': ['150', '150', '150'], 'Vt': ['channel', 'channel', 'channel'], 'Ti': ['Q2', 'Q2', 'Q2'], 'Cs': ['K11HE-D', 'K08JV-D', 'K08OW-D'], 'Tg': ['BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD'], 'Fd': ['dish#K11HE-D', 'dish#K08JV-D', 'dish#K08OW-D'], 'Pi': ['CHAF2', 'CHAF2', 'CHAF2'], 'Gi': ['RV1688668060', 'RV1714277379', 'RV1714277380']}
2. {'Cp': ['1000', '1000', '1000'], 'Af': ['CBS', 'CBS', 'CBS'], 'Bp': ['150', '150', '150'], 'Vt': ['channel', 'channel', 'channel'], 'Ti': ['Q2', 'Q2', 'Q2'], 'Cs': ['K11HE-D', 'K08JV-D', 'K08OW-D'], 'Tg': ['BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD'], 'Fd': ['dish#K11HE-D', 'dish#K08JV-D', 'dish#K08OW-D'], 'Pi': ['CHAF2', 'CHAF2', 'CHAF2'], 'Gi': ['RV1688668060', 'RV1714277379', 'RV1714277380']}
3. {'Cp': ['1000'], 'Af': ['CBS'], 'Bp': ['150'], 'Vt': ['channel'], 'Ti': ['ABCD'], 'Cs': ['K20LT-D'], 'Tg': ['BROADCAST<>LOCAL<>HD'], 'Fd': ['dish#K20LT-D'], 'Pi': ['CHAF2'], 'Gi': ['RV1714278093']}
4. {'Cp': ['1000', '1000', '1000'], 'Af': ['CBS', 'CBS', 'CBS'], 'Bp': ['150', '150', '150'], 'Vt': ['channel', 'channel', 'channel'], 'Ti': ['Q2', 'Q2', 'Q2'], 'Cs': ['K11HE-D', 'K08JV-D', 'K08OW-D'], 'Tg': ['BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD'], 'Fd': ['dish#K11HE-D', 'dish#K08JV-D', 'dish#K08OW-D'], 'Pi': ['CHAF2', 'CHAF2', 'CHAF2'], 'Gi': ['RV1688668060', 'RV1714277379', 'RV1714277380']}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM