使用 python 中的 OrderedDict 元素拆分 csv 文件

Question

我有一个 csv 文件，其中的列是 orderedDicts 的 arrays。 例如，下面是一列。

[OrderedDict([('@href', 'https://api.elsevier.com/content/abstract/scopus_id/0017048125'), ('@rel', 'self')]), OrderedDict([('@href', 'https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=0017048125&origin=inward'), ('@rel', 'scopus')]), OrderedDict([('@href', 'https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=0017048125&origin=inward'), ('@rel', 'scopus-citedby')])]

当我使用 csv 阅读器并在逗号处拆分时，此元素也会拆分。 由于数据没有将这些元素括在引号内，因此我无法拆分它。 我正在考虑编写自己的 function 以在逗号处拆分它，然后将 OrderedDict 项目组合在一起。 但是，这可能效率低下且乏味。 有一个更好的方法吗？ 可能与正则表达式？

Answer 1

假设您在给定格式的文件中有多行，而不是尝试使用正则表达式来尝试提取信息，您可以（非常小心地）使用 Python 的exec() function 将每一行加载到一个Python 变量：

from collections import OrderedDict

with open('input.txt') as f_in:
    for line in f_in:
        exec("row = " + line)

        for od in row:    # loop over each OrderedDict in the row
            print(f"{od['@rel']:20}  {od['@href']} ")

所以如果input.txt只包含一行（如上），这将是 output：

self                  https://api.elsevier.com/content/abstract/scopus_id/0017048125 
scopus                https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=0017048125&origin=inward 
scopus-citedby        https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=0017048125&origin=inward

注意：使用exec()应该小心。 您应该确保您的源数据不包含任何潜在的恶意条目。

如果您发布指向实际文件副本（或至少包含多个示例行的文件）的链接，它也可能会有所帮助，您可以使用诸如0bin 之类的服务。

使用 python 中的 OrderedDict 元素拆分 csv 文件

问题描述

1 个解决方案

解决方案1
0 2020-08-20 17:04:29

使用 python 中的 OrderedDict 元素拆分 csv 文件

问题描述

1 个解决方案

解决方案1 0 2020-08-20 17:04:29

解决方案1
0 2020-08-20 17:04:29