解析从Azure数据湖下载的json文件

Question

我从 azure 数据湖下载了一个文件，格式如下：

{"PartitionKey":"2020-10-05","value":"Resolved"...}
{"PartitionKey":"2020-10-06","value":"Resolved"...}

我只想阅读和解析 python 中的这个。

def read_ods_file():

    file_path = 'temp.json'
    data = []
    with open(file_path) as f:
        for line in f:
            data.append(json.loads(line))

这给了我例外：

          data.append(json.loads(line))
        File "C:\python3.6\lib\json\__init__.py", line 354, in loads
          return _default_decoder.decode(s)
        File "C:\python3.6\lib\json\decoder.py", line 339, in decode
          obj, end = self.raw_decode(s, idx=_w(s, 0).end())
        File "C:\python3.6\lib\json\decoder.py", line 357, in raw_decode
          raise JSONDecodeError("Expecting value", s, err.value) from None
      json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

打印行会在开头显示这些添加的字符。 这些添加的字符是什么？

ï»¿{"PartitionKey":"2020-10-05","value":"Resolved"...}

{"PartitionKey":"2020-10-06","value":"Resolved"...}

Answer 1

微软使用各种奇怪的字符。 您可以尝试使用string.printable来只获取正常的 ASCII 字符，如下所示：

如何使用 Python 删除非 ASCII 字符但保留句点和空格？

Answer 2

您设置的f变量

with open(file_path) as f:

是一个 python 文件 object（类型为_io.TextIOWrapper ）。 如果你想把每一行读成 json object，你应该尝试这样的事情：

with open(file_path) as f:
    # read the file contents into a string
    # strip off trailing whitespace
    # split string into list of strings on \n character
    for line in f.read().strip().splitlines():
        data.append(json.loads(line))

解析从Azure数据湖下载的json文件

问题描述

2 个解决方案

解决方案1
1 2020-10-07 15:47:45

解决方案2
0 2020-10-07 15:50:00

解析从Azure数据湖下载的json文件

问题描述

2 个解决方案

解决方案1 1 2020-10-07 15:47:45

解决方案2 0 2020-10-07 15:50:00

解决方案1
1 2020-10-07 15:47:45

解决方案2
0 2020-10-07 15:50:00