简体   繁体   English

Python将txt文件转换为JSON剥离和元素排序

[英]Python convert txt file to JSON stripping and element ordering

I have a txt file that contains data split by spaces such as: 我有一个txt文件,其中包含按空格分割的数据,例如:

2017-05-16 00:44:36.151724381 +43.8187 -104.7669 -004.4 00.6 00.2 00.2 090 C
2017-05-16 00:44:36.246672534 +41.6321 -104.7834 +004.3 00.6 00.3 00.2 130 C
2017-05-16 00:44:36.356132768 +46.4559 -104.5989 -004.2 01.1 00.4 00.2 034 C

and I would like to convert it into JSON data to something like: 我想将它转换为JSON数据,如:

{"dataset": "Lightning","observation_date": "20170516004436151", "location": { "type": "point", "coordinates": [43.8187, -104.7669]}}
{"dataset": "Lightning","observation_date": "20170516004436246", "location": { "type": "point", "coordinates": [41.6321, -104.7834]}}
{"dataset": "Lightning","observation_date": "20170516004436356", "location": { "type": "point", "coordinates": [46.4559, -104.5989]}}

where I have to append a 'dataset':'lightning' key/val pair, combine and strip the date and time, and combine the lat/lng into a dict before doing any json conversion. 我必须附加'数据集':'闪电'键/值对,组合并去掉日期和时间,并在进行任何json转换之前将lat / lng组合成一个dict。

But right now I still get the date and time elements without being stripped of the "-" and ":" characters like: 但是现在我仍然得到日期和时间元素,而不会被剥夺“ - ”和“:”字符,如:

{"observation_date": "2017-05-1600:44:36.151724381", "location": {"type": "point", "coordinates": ["+43.8187", "-104.7669"]}, "dataset": "Lightning"}
{"observation_date": "2017-05-1600:44:36.246672534", "location": {"type": "point", "coordinates": ["+41.6321", "-104.7834"]}, "dataset": "Lightning"}
{"observation_date": "2017-05-1600:44:36.356132768", "location": {"type": "point", "coordinates": ["+46.4559", "-104.5989"]}, "dataset": "Lightning"}

What I coded so far: 到目前为止我编码的内容:

import json
import sys
def convert(filename):
    dataDict = {}
    txtFile = filename[0]
    print "Opening TXT file: ",txtFile
    infile = open(txtFile, "r")
    for line in infile:
        lineStrip = line.strip()
        parts = [p.strip() for p in lineStrip.split()]
        date = parts[0].strip("-") #trying to get rid of "-" but not working
        time = parts[1].strip(":") #trying to get rid of ":" and "." but not working
        dataDict.update({"dataset":"Lightning"})
        dataDict.update({"observation_date": date + time})
        dataDict.update({"location": {"type":"point", "coordinates": [parts[2], parts[3]]}})
        json_filename = txtFile.split(".")[0]+".json"
        jsonf = open(json_filename,'a')
        data = json.dumps(dataDict)
        jsonf.write(data + "\n")
        print dataDict
    infile.close()
    jsonf.close()   
if __name__=="__main__":
    convert(sys.argv[1:])

But I'm not sure how to strip the "-", ".", and ":" as well as place the "dataset":"lightning" element in the front. 但我不知道如何剥离“ - ”,“。”和“:”以及将“数据集”:“闪电”元素放在前面。

This should work 这应该工作

date = parts[0].replace("-",'') #trying to get rid of "-" but not working

time = parts[1].replace(":",'') #trying to get rid of ":" and "." but not working

You should do: 你应该做:

date = parts[0].replace('-', '') time = parts[1].replace(':' '') date = parts [0] .replace(' - ','')time = parts [1] .replace(':''')

To get the dataset up front in JSON, the only option you have is sort the keys: 要在JSON中预先获取dataset ,您唯一的选择是对键进行排序:

data = json.dumps(dataDict, sort_keys=True)

You should also consider doing 你也应该考虑做

dataDict["dataset"] = "Lightning"

instead of the .update . 而不是.update

Python dictionaries are unordered, so you can't specify the "dataset":"lightning" element to be first. Python字典是无序的,因此您不能指定"dataset":"lightning"元素是第一个。 For that I would use an OrderedDict instead or sort the json as others have mentioned. 为此,我会使用OrderedDict,或者像其他人提到的那样对json进行排序。

In order to format the time correctly, I'd use a datetime object as such: 为了正确格式化时间,我将使用datetime对象:

import datetime

date_string = parts[0] + parts[1]
format = "%Y-%d-%m%H:%M:%S.%f"
dt = datetime.strptime(date_string, format)
new_date_string = dt.strftime("%Y%d%m%H%M%S")

Using a datetime object is helpful because it plays nicely with pandas and numpy if you continue to work on the data beyond spitting out the json. 使用datetime对象是有帮助的,因为它可以很好地与pandas和numpy一起使用,如果你继续处理数据而不是吐出json。 It also supports mathematical operations and time zone localization if you need it to. 如果需要,它还支持数学运算和时区本地化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM