简体   繁体   English

在Python中,如何以人类可读的形式呈现成对的字典?

[英]In Python, how to present pairs of dicts in human readable form?

Basically, I'm looking for an efficient way (in terms of coding effort) to present a list of pairs of Dicts in a human-readable form. 基本上,我正在寻找一种有效的方法(就编码工作而言)以人类可读的形式呈现成对的字典对列表。 In Python 2.7. 在Python 2.7中。

I have two lists of OrderedDict . 我有两个OrderedDict列表。 Each dict is a record of book data (title, author etc). 每个字典都是书籍数据(书名,作者等)的记录。 One list has messy data (typo's etc), the other has tidy data. 一个列表包含混乱的数据(错别字等),另一个列表则具有整洁的数据。 I'm using difflib.SequenceMatcher to find the closest match of untidy titles, to tidy ones. 我正在使用difflib.SequenceMatcher查找不整齐的标题与整齐的标题最接近的匹配项。 That works nicely. 很好。

It gives me a list of pairs of dicts, namely each untidy dict to it's closest matching tidy one. 它给了我成对字典的列表,即每个最接近的整洁字典。 Those pairs need to be reviewed, pair by pair, by humans. 这些对需要由人类一对一对地进行审查。 So I want to output each pair to the screen, showing the untidy and the tidy dict side by side, each in it's own panel. 所以我想将每对输出到屏幕上,并排显示不整洁和整洁的字典,每个都在自己的面板中。 Each dict may have a varying amount of additional fields, eg. 每个字典可以具有变化量的附加字段,例如。 co-author, publisher, date, etc. 合著者,出版者,日期等

difflib.HtmlDiff doesn't really do what I want. difflib.HtmlDiff并没有真正做到我想要的。

Exporting to Excel (via CSV) is not ideal, because data isn't flat. (通过CSV)导出到Excel是不理想的,因为数据不是平坦的。 (One line in excel will have a different number of fields than another). (excel中的一行将具有与另一行不同的字段数)。 Likewise for Google Refine, I think that's more oriented towards tabular data. 同样,对于Google Refine,我认为这更适合表格数据。

Call me lazy, but Tkinter or XML/HTML seem to be overkill. 叫我偷懒,但是Tkinter或XML / HTML似乎过大了。 It's just a once-off exercise. 这只是一次练习。

I'm not familiar at all with JSON nor YAML, maybe I should look there? 我对JSON和YAML一点都不熟悉,也许我应该去看看?

Any better suggestions? 还有更好的建议吗?

I have this hunch that I haven't found the right search terms yet. 我有这样的预感,我还没有找到合适的搜索词。

What I had to output was a list of 3-item lists, each 3-item list containing matching number and two ordered dicts with title-to-correct and best matching title, both with additional info such as author, shelfmark, etc. 我必须输出的是一个包含3个项目的列表,每个包含3个项目的列表包含匹配的数字和两个排序的字典,其中包含正确的标题和最佳匹配的标题,以及其他信息,例如作者,货架标记等。

I went for output in Yaml, because it's advertised as human-readable and human-editable. 我用Yaml进行输出,因为它被宣传为易于阅读和可编辑的广告。 For this I have no user testimonial yet, but creating the output file was really easy (if you take time to read the PyYaml documentation ). 为此,我还没有用户评价,但是创建输出文件确实非常容易(如果您花时间阅读PyYaml文档 )。

import yaml
.
.

with codecs.open('Lit_titles_match.yml', 'w', 'utf-8-sig') as m:
    # match is a list of lists of one float and two dicts.   
    m.write(yaml.dump_all(match, default_flow_style=False))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM