[英]Creating a tree/deeply nested dict from an indented text file in python
Basically, I want to iterate through a file and put the contents of each line into a deeply nested dict, the structure of which is defined by the amount of whitespace at the start of each line. 基本上,我想迭代一个文件并将每行的内容放入一个深度嵌套的dict中,其结构由每行开头的空白量定义。
Essentially the aim is to take something like this: 基本上我们的目标是采取这样的方式:
a
b
c
d
e
And turn it into something like this: 把它变成这样的东西:
{"a":{"b":"c","d":"e"}}
Or this: 或这个:
apple
colours
red
yellow
green
type
granny smith
price
0.10
into this: 进入这个:
{"apple":{"colours":["red","yellow","green"],"type":"granny smith","price":0.10}
So that I can send it to Python's JSON module and make some JSON. 这样我就可以将它发送到Python的JSON模块并制作一些JSON。
At the moment I'm trying to make a dict and a list in steps like such: 目前我正试图按照这样的步骤制作一个字典和一个列表:
{"a":""} ["a"]
{"a":"b"} ["a"]
{"a":{"b":"c"}} ["a","b"]
{"a":{"b":{"c":"d"}}}} ["a","b","c"]
{"a":{"b":{"c":"d"},"e":""}} ["a","e"]
{"a":{"b":{"c":"d"},"e":"f"}} ["a","e"]
{"a":{"b":{"c":"d"},"e":{"f":"g"}}} ["a","e","f"]
etc. 等等
The list acts like 'breadcrumbs' showing where I last put in a dict. 该列表的行为类似于“breadcrumbs”,显示了我最后输入dict的位置。
To do this I need a way to iterate through the list and generate something like dict["a"]["e"]["f"]
to get at that last dict. 要做到这一点,我需要一种方法来遍历列表并生成像
dict["a"]["e"]["f"]
来获得最后一个字典。 I've had a look at the AutoVivification class that someone has made which looks very useful however I'm really unsure of: 我已经看过有人制作的AutoVivification类看起来非常有用但是我真的不确定:
I came up with the following function but it doesn't work: 我提出了以下功能,但它不起作用:
def get_nested(dict,array,i):
if i != None:
i += 1
if array[i] in dict:
return get_nested(dict[array[i]],array)
else:
return dict
else:
i = 0
return get_nested(dict[array[i]],array)
Would appreciate help! 非常感谢帮助!
(The rest of my extremely incomplete code is here:) (其余的非常不完整的代码在这里:)
#Import relevant libraries
import codecs
import sys
#Functions
def stripped(str):
if tab_spaced:
return str.lstrip('\t').rstrip('\n\r')
else:
return str.lstrip().rstrip('\n\r')
def current_ws():
if whitespacing == 0 or not tab_spaced:
return len(line) - len(line.lstrip())
if tab_spaced:
return len(line) - len(line.lstrip('\t\n\r'))
def get_nested(adict,anarray,i):
if i != None:
i += 1
if anarray[i] in adict:
return get_nested(adict[anarray[i]],anarray)
else:
return adict
else:
i = 0
return get_nested(adict[anarray[i]],anarray)
#initialise variables
jsondict = {}
unclosed_tags = []
debug = []
vividfilename = 'simple.vivid'
# vividfilename = sys.argv[1]
if len(sys.argv)>2:
jsfilename = sys.argv[2]
else:
jsfilename = vividfilename.split('.')[0] + '.json'
whitespacing = 0
whitespace_array = [0,0]
tab_spaced = False
#open the file
with codecs.open(vividfilename,'rU', "utf-8-sig") as vividfile:
for line in vividfile:
#work out how many whitespaces at start
whitespace_array.append(current_ws())
#For first line with whitespace, work out the whitespacing (eg tab vs 4-space)
if whitespacing == 0 and whitespace_array[-1] > 0:
whitespacing = whitespace_array[-1]
if line[0] == '\t':
tab_spaced = True
#strip out whitespace at start and end
stripped_line = stripped(line)
if whitespace_array[-1] == 0:
jsondict[stripped_line] = ""
unclosed_tags.append(stripped_line)
if whitespace_array[-2] < whitespace_array[-1]:
oldnested = get_nested(jsondict,whitespace_array,None)
print oldnested
# jsondict.pop(unclosed_tags[-1])
# jsondict[unclosed_tags[-1]]={stripped_line:""}
# unclosed_tags.append(stripped_line)
print jsondict
print unclosed_tags
print jsondict
print unclosed_tags
Here is a recursive solution. 这是一个递归解决方案。 First, transform the input in the following way.
首先,按以下方式转换输入。
Input: 输入:
person:
address:
street1: 123 Bar St
street2:
city: Madison
state: WI
zip: 55555
web:
email: boo@baz.com
First-step output: 第一步输出:
[{'name':'person','value':'','level':0},
{'name':'address','value':'','level':1},
{'name':'street1','value':'123 Bar St','level':2},
{'name':'street2','value':'','level':2},
{'name':'city','value':'Madison','level':2},
{'name':'state','value':'WI','level':2},
{'name':'zip','value':55555,'level':2},
{'name':'web','value':'','level':1},
{'name':'email','value':'boo@baz.com','level':2}]
This is easy to accomplish with split(':')
and by counting the number of leading tabs: 使用
split(':')
和计算前导标签的数量很容易实现:
def tab_level(astr):
"""Count number of leading tabs in a string
"""
return len(astr)- len(astr.lstrip('\t'))
Then feed the first-step output into the following function: 然后将第一步输出提供给以下函数:
def ttree_to_json(ttree,level=0):
result = {}
for i in range(0,len(ttree)):
cn = ttree[i]
try:
nn = ttree[i+1]
except:
nn = {'level':-1}
# Edge cases
if cn['level']>level:
continue
if cn['level']<level:
return result
# Recursion
if nn['level']==level:
dict_insert_or_append(result,cn['name'],cn['value'])
elif nn['level']>level:
rr = ttree_to_json(ttree[i+1:], level=nn['level'])
dict_insert_or_append(result,cn['name'],rr)
else:
dict_insert_or_append(result,cn['name'],cn['value'])
return result
return result
where: 哪里:
def dict_insert_or_append(adict,key,val):
"""Insert a value in dict at key if one does not exist
Otherwise, convert value to list and append
"""
if key in adict:
if type(adict[key]) != list:
adict[key] = [adict[key]]
adict[key].append(val)
else:
adict[key] = val
The following code will take a block-indented file and convert into an XML tree; 以下代码将采用块缩进文件并转换为XML树; this:
这个:
foo
bar
baz
ban
bal
...becomes: ...变为:
<cmd>foo</cmd>
<cmd>bar</cmd>
<block>
<name>baz</name>
<cmd>ban</cmd>
<cmd>bal</cmd>
</block>
The basic technique is: 基本技术是:
So: 所以:
from lxml import builder
C = builder.ElementMaker()
def indent(line):
strip = line.lstrip()
return len(line) - len(strip), strip
def parse_blockcfg(data):
top = current_block = C.config()
stack = []
current_indent = 0
lines = data.split('\n')
while lines:
line = lines.pop(0)
i, line = indent(line)
if i==current_indent:
pass
elif i > current_indent:
# we've gone down a level, convert the <cmd> to a block
# and then save the current ident and block to the stack
prev.tag = 'block'
prev.append(C.name(prev.text))
prev.text = None
stack.insert(0, (current_indent, current_block))
current_indent = i
current_block = prev
elif i < current_indent:
# we've gone up one or more levels, pop the stack
# until we find out which level and return to it
found = False
while stack:
parent_indent, parent_block = stack.pop(0)
if parent_indent==i:
found = True
break
if not found:
raise Exception('indent not found in parent stack')
current_indent = i
current_block = parent_block
prev = C.cmd(line)
current_block.append(prev)
return top
Here is an object oriented approach based on a composite structure of nested Node
objects. 这是一种基于嵌套
Node
对象的复合结构的面向对象方法。
Input: 输入:
indented_text = \
"""
apple
colours
red
yellow
green
type
granny smith
price
0.10
"""
a Node class 一个Node类
class Node:
def __init__(self, indented_line):
self.children = []
self.level = len(indented_line) - len(indented_line.lstrip())
self.text = indented_line.strip()
def add_children(self, nodes):
childlevel = nodes[0].level
while nodes:
node = nodes.pop(0)
if node.level == childlevel: # add node as a child
self.children.append(node)
elif node.level > childlevel: # add nodes as grandchildren of the last child
nodes.insert(0,node)
self.children[-1].add_children(nodes)
elif node.level <= self.level: # this node is a sibling, no more children
nodes.insert(0,node)
return
def as_dict(self):
if len(self.children) > 1:
return {self.text: [node.as_dict() for node in self.children]}
elif len(self.children) == 1:
return {self.text: self.children[0].as_dict()}
else:
return self.text
To parse the text, first create a root node. 要解析文本,请首先创建根节点。 Then, remove empty lines from the text, and create a
Node
instance for every line, pass this to the add_children
method of the root node. 然后,从文本中删除空行,并为每一行创建一个
Node
实例,将其传递给add_children
方法。
root = Node('root')
root.add_children([Node(line) for line in indented_text.splitlines() if line.strip()])
d = root.as_dict()['root']
print(d)
result: 结果:
{'apple': [
{'colours': ['red', 'yellow', 'green']},
{'type': 'granny smith'},
{'price': '0.10'}]
}
I think that it should be possible to do it in one step, where you simply call the constructor of Node
once, with the indented text as an argument. 我认为应该可以在一步中完成它,你只需要调用
Node
的构造函数一次,并将缩进的文本作为参数。
First of all, don't use array
and dict
as variable names because they're reserved words in Python and reusing them may end up in all sorts of chaos. 首先,不要使用
array
和dict
作为变量名,因为它们是Python中的保留字并且重用它们可能最终会出现各种混乱。
OK so if I get you correctly, you have a tree given in a text file, with parenthood indicated by indentations, and you want to recover the actual tree structure. 好吧,如果我找到你的话,你会在一个文本文件中给出一个树,并用缩进表示父母,并且你想要恢复实际的树结构。 Right?
对?
Does the following look like a valid outline? 以下内容是否有效? Because I have trouble putting your current code into context.
因为我无法将当前代码放入上下文中。
result = {}
last_indentation = 0
for l in f.xreadlines():
(c, i) = parse(l) # create parse to return character and indentation
if i==last_indentation:
# sibling to last
elif i>last_indentation:
# child to last
else:
# end of children, back to a higher level
OK then your list are the current parents, that's in fact right - but I'd keep them pointed to the dictionary you've created, not the literal letter 那么你的列表是当前的父母,这实际上是正确的 - 但我会让他们指向你创建的字典,而不是文字字母
just starting some stuff here 刚开始做一些东西
result = {}
parents = {}
last_indentation = 1 # start with 1 so 0 is the root of tree
parents[0] = result
for l in f.xreadlines():
(c, i) = parse(l) # create parse to return character and indentation
if i==last_indentation:
new_el = {}
parents[i-1][c] = new_el
parents[i] = new_el
elif i>last_indentation:
# child to last
else:
# end of children, back to a higher level
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.