简体   繁体   English

在Python中以所需格式格式化字符串

[英]Formatting a string in required format in Python

I have a data in format: 我有一个格式的数据:

id1 id2 value Something like id1 id2值

1   234  0.2
1   235  0.1

and so on. 等等。 I want to convert it in json format: 我想将其转换为json格式:

{
  "nodes": [ {"name":"1"},  #first element
             {"name":"234"}, #second element
             {"name":"235"} #third element
             ] ,
   "links":[{"source":1,"target":2,"value":0.2},
             {"source":1,"target":3,"value":0.1}
           ]
}

So, from the original data to above format.. the nodes contain all the set of (distinct) names present in the original data and the links are basically the line number of source and target in the values list returned by nodes. 因此,从原始数据到上述格式,节点包含原始数据中存在的所有(唯一)名称集,而链接基本上是节点返回的值列表中源和目标的行号。 For example: 例如:

   1 234 0.2

1 is in the first element in the list of values holded by the key "nodes" 234 is the second element in the list of values holded by the key "nodes" 1是键“节点”所拥有的值列表中的第一个元素234是键“节点”所拥有的值列表中的第二个元素

Hence the link dictionary is {"source":1,"target":2,"value":0.2} 因此,链接字典为{“源”:1,“目标”:2,“值”:0.2}

How do i do this efficiently in python.. I am sure there should be better way than what I am doing which is so messy :( Here is what I am doing from collections import defaultdict 我如何在python中有效地做到这一点。我确信应该有比我正在做的更好的方法,这太乱了:(这是我从集合导入defaultdict所做的事情

def open_file(filename,output=None):
    f = open(filename,"r")
    offset = 3429
    data_dict = {}
    node_list = []
    node_dict = {}
    link_list = []
    num_lines = 0
    line_ids = []
    for line in f:
        line = line.strip()
        tokens = line.split()
        mod_wid  = int(tokens[1]) + offset


        if not node_dict.has_key(tokens[0]):
            d = {"name": tokens[0],"group":1}
            node_list.append(d)
            node_dict[tokens[0]] = True
            line_ids.append(tokens[0])
        if not node_dict.has_key(mod_wid):
            d = {"name": str(mod_wid),"group":1}
            node_list.append(d)
            node_dict[mod_wid] = True
            line_ids.append(mod_wid)


        link_d = {"source": line_ids.index(tokens[0]),"target":line_ids.index(mod_wid),"value":tokens[2]}
        link_list.append(link_d)
        if num_lines > 10000:
            break
        num_lines +=1


    data_dict = {"nodes":node_list, "links":link_list}

    print "{\n"
    for k,v in data_dict.items():
        print  '"'+k +'"' +":\n [ \n " 
        for each_v in v:
            print each_v ,","
        print "\n],"
    print "}"

open_file("lda_input.tsv")

I'm assuming by "efficiently" you're talking about programmer efficiency—how easy it is to read, maintain, and code the logic—rather than runtime speed efficiency. 我假设“有效”是指程序员的效率,即读取,维护和编码逻辑的难易程度,而不是运行时速度的效率。 If you're worried about the latter, you're probably worried for no reason. 如果您担心后者,则可能无缘无故担心。 (But the code below will probably be faster anyway.) (但是下面的代码无论如何可能会更快。)

The key to coming up with a better solution is to think more abstractly. 提出更好的解决方案的关键是更抽象地思考。 Think about rows in a CSV file, not lines in a text file; 考虑一下CSV文件中的行,而不是文本文件中的行; create a dict that can be rendered in JSON rather than trying to generate JSON via string processing; 创建一个可以用JSON呈现的dict ,而不是尝试通过字符串处理生成JSON; wrap things up in functions if you want to do them repeatedly; 如果要重复执行,请将它们包装在函数中; etc. Something like this: 等等:

import csv
import json
import sys

def parse(inpath, namedict):
    lastname = [0]
    def lookup_name(name):
        try:
            print('Looking up {} in {}'.format(name, names))
            return namedict[name]
        except KeyError:
            lastname[0] += 1
            print('Adding {} as {}'.format(name, lastname[0]))
            namedict[name] = lastname[0]
            return lastname[0]
    with open(inpath) as f:
        reader = csv.reader(f, delimiter=' ', skipinitialspace=True)
        for id1, id2, value in reader:
            yield {'source': lookup_name(id1),
                   'target': lookup_name(id2),
                   'value': value}

for inpath in sys.argv[1:]:
    names = {}
    links = list(parse(inpath, names))
    nodes = [{'name': name} for name in names]
    outpath = inpath + '.json'
    with open(outpath, 'w') as f:
        json.dump({'nodes': nodes, 'links': links}, f, indent=4)

Don't construct the JSON manually. 不要手动构造JSON。 Make it out of an existing Python object with the json module: 使用json模块将其从现有的Python对象中删除:

def parse(data):
    nodes = set()
    links = set()

    for line in data.split('\n'):
        fields = line.split()

        id1, id2 = map(int, fields[:2])
        value = float(fields[2])

        nodes.update((id1, id2))
        links.add((id1, id2, value))

    return {
        'nodes': [{
            'name': node
        } for node in nodes],
        'links': [{
            'source': link[0],
            'target': link[1],
            'value': link[2]
        } for link in links]
    }

Now, you can use json.dumps to get a string: 现在,您可以使用json.dumps获取字符串:

>>> import json
>>> data = '1   234  0.2\n1   235  0.1'
>>> parsed = parse(data)
>>> parsed
    {'links': [{'source': 1, 'target': 235, 'value': 0.1},
  {'source': 1, 'target': 234, 'value': 0.2}],
 'nodes': [{'name': 1}, {'name': 234}, {'name': 235}]}
>>> json.dumps(parsed)
    '{"nodes": [{"name": 1}, {"name": 234}, {"name": 235}], "links": [{"source": 1, "target": 235, "value": 0.1}, {"source": 1, "target": 234, "value": 0.2}]}'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM