简体   繁体   English

读取并处理文本文件并保存到csv

[英]Read and process a text file and save to csv

The files I have seem to be in a "dict" format... 我的文件似乎是“dict”格式......

file header is as follows: time,open,high,low,close,volume 文件头如下: time,open,high,low,close,volume

next line is as follows: {"t":[1494257340],"o":[206.7],"h":[209.3],"l":[204.50002],"c":[204.90001],"v":[49700650]}` 下一行如下:{“t”:[1494257340],“o”:[206.7],“h”:[209.3],“l”:[204.50002],“c”:[204.90001],“v” :[49700650]}`

    import csv
    with open ('test_data.txt', 'rb') as f:

    for line in f:
        dict_file = eval(f.read())
        time = (dict_file['t'])    # print (time) result [1494257340]
        open_price = (dict_file['o'])    # print (open_price) result [206.7]
        high = (dict_file['h'])    # print (high) result [209.3]
        low = (dict_file['l'])    # print (low) result [204.50002]
        close = (dict_file['c'])    # print (close) result [204.90001]
        volume = (dict_file['v'])    # print (volume) result [49700650]

        print (time, open_price, high, low, close, value)

# print result [1494257340] [206.7] [209.3] [204.50002] [204.90001] [49700650]

# I need to remove the [] from the output.

# expected result 

# 1494257340, 206.7, 209.3, 204.50002, 204.90001, 49700650

the result I need is (change time ("epoch date format") to dd,mm,yy 我需要的结果是(将时间(“纪元日期格式”)改为dd,mm,yy

5/8/17, 206.7, 209.3, 204.50002, 204.90001, 49700650

so I know I need the csv.writer function 所以我知道我需要csv.writer函数

I see a number of problems in the code you submitted. 我在您提交的代码中看到了许多问题。 I recommend you to break your task into small pieces and see if you can make them work individually. 我建议你把你的任务分成小块,看看你是否可以让它们单独工作。 So what are you trying to do is: 那么你想要做的是:

  1. open a file 打开一个文件
  2. read the file line by line 逐行读取文件
  3. eval each line to get a dict object eval每一行以获得一个dict对象
  4. get values from that object 从该对象获取值
  5. write those values in a (separate?) csv file 将这些值写入(单独的?)csv文件中

Right? 对?

Now do each one, one small step at the time 现在做每一个,当时只有一小步

  1. opening a file. 打开一个文件。

You're pretty much on point there: 你在那里非常重要:

with open('test_data.txt', 'rb') as f:
    print(f.read())

# b'{"t":[1494257340],"o":[207.75],"h":[209.8],"l":[205.75],"c":[206.35],"v":[61035956]}\n'

You can open the file in r mode instead, it will give you strings instead of byte type objects 您可以在r模式下打开文件,它将为您提供字符串而不是byte类型对象

with open('test_data.txt', 'r') as f:
    print(f.read())

# {"t":[1494257340],"o":[207.75],"h":[209.8],"l":[205.75],"c":[206.35],"v":[61035956]}

It might cause some problems but should work since eval can handle it just fine (at least in python 3) 它可能会导致一些问题但是应该可以工作,因为eval可以很好地处理它(至少在python 3中)

  1. read the file line by line 逐行读取文件
with open('test_data.txt', 'rb') as f:
    for line in f:
        print(line)

# b'{"t":[1494257340],"o":[207.75],"h":[209.8],"l":[205.75],"c":[206.35],"v":[61035956]}\n'

Here is another problem in your code, you're not using line variable and trying to f.read() instead. 这是你的代码中的另一个问题,你没有使用line变量并试图改为f.read() This will just read entire file (starting from the second line, since the first one is been read already). 这将只读取整个文件(从第二行开始,因为已经读取了第一行)。 Try to swap one for another and see what happens 尝试互换一个,看看会发生什么

  1. eval each line to get a dict object eval每一行以获得一个dict对象

Again. 再次。 This works fine. 这很好用。 but I would add some protection here. 但我会在这里加一些保护。 What if you get an empty line in the file or a misformatted one. 如果您在文件中找到空行或格式错误的行,该怎么办? Also if this file comes from an untrusted source you may become a victim of a code injection here, like if a line in your file changed to: 此外,如果此文件来自不受信任的来源,您可能会成为代码注入的受害者,例如文件中的行更改为:

print("You've been hacked") or {"t":[1494257340],"o":[207.75],"h":[209.8],"l":[205.75],"c":[206.35],"v":[61035956]}

with open('test_data.txt', 'rb') as f:
    for line in f:
        dict_file = eval(line)
        print(dict_file)

# You've been hacked
# {'t': [1494257340], 'o': [207.75], 'h': [209.8], 'l': [205.75], 'c': [206.35], 'v': [61035956]}

I don't know your exact specifications, but you should be safer with json.loads instead. 我不知道你的具体规格,但你应该用json.loads更安全。

... ...


Can you continue on your own from there? 你可以从那里继续自己吗?

  1. get values from the object 从对象中获取值

I think dict_file['t'] doesn't give you the value you expect. 我认为dict_file['t']没有给你你期望的价值。

What does it give you? 它给你带来了什么?

Why? 为什么?

How to fix it? 怎么解决?

  1. write those values in a csv file 在csv文件中写入这些值

Can you write some random string to a file? 你能把一些随机字符串写入文件吗?

What scv format looks like? scv格式是什么样的? Can you format your values to match it 您可以格式化您的值以匹配它

Check the docs for csv module, can it be of help to you? 检查csv模块的文档,它对你有帮助吗?

And so on and so forth... 等等等等...


EDIT: Solution 编辑:解决方案

# you can save the print output in a file by running:
# $ python convert_to_csv.py > output.cvs
import datetime, decimal, json, os


CSV_HEADER = 'time,open,high,low,close,volume'


with open('test_data.txt', 'rb') as f:

    print(CSV_HEADER)

    for line in f:
        data = json.loads(line, parse_float=decimal.Decimal)
        data['t'][0] = datetime.datetime.fromtimestamp(data['t'][0]) \
            .strftime('%#d/%#m/%y' if os.name == 'nt' else '%-d/%-m/%y')
        print(','.join(str(data[k][0]) for k in 'tohlcv'))

Running: 运行:

$ cat test_data.txt
{"t":[1494257340],"o":[207.75],"h":[209.8],"l":[205.75],"c":[206.35],"v":[61035956]}
{"t":[1490123123],"o":[107.75],"h":[109.8],"l":[105.75],"c":[106.35],"v":[11035956]}
{"t":[1491234234],"o":[307.75],"h":[309.8],"l":[305.75],"c":[306.35],"v":[31035956]}

$ python convert_to_csv.py
time,open,high,low,close,volume
8/5/17,207.75,209.8,205.75,206.35,61035956
21/3/17,107.75,109.8,105.75,106.35,11035956
3/4/17,307.75,309.8,305.75,306.35,31035956

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM