[英]Reading key-value pairs into Pandas
Pandas makes it really easy to read a CSV file: Pandas使得读取CSV文件非常容易:
pd.read_table('data.txt', sep=',')
Does Pandas having something similar for a file with key-value pairs? 对于具有键值对的文件,Pandas是否具有类似的功能? I came-up with this:
我想出了这个:
pd.DataFrame([dict([p.split('=') for p in l.split(',')]) for l in open('data.txt')])
If not built-in, then perhaps something more idiomatic? 如果不是内置的,那么也许更惯用了吗?
The file of interest looks like this: 感兴趣的文件如下所示:
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525690751,price=1548.00,quantity=551
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525697183,price=1548.00,quantity=551
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525714498,price=1548.00,quantity=551
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525734967,price=1548.00,quantity=551
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525735567,price=1548.00,quantity=555
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525735585,price=1548.00,quantity=556
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525736116,price=1548.00,quantity=556
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525740757,price=1548.00,quantity=556
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525748502,price=1548.00,quantity=556
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525748952,price=1548.00,quantity=557
It has the exact same keys on every line, and in the same order. 它在每一行上具有完全相同的键,并且顺序相同。 There are no null values.
没有空值。 The table to be generated is:
要生成的表是:
exchange price quantity symbol timestamp
0 GLOBEX 1548.00 551\n ESM3 1365428525690751
1 GLOBEX 1548.00 551\n ESM3 1365428525697183
2 GLOBEX 1548.00 551\n ESM3 1365428525714498
3 GLOBEX 1548.00 551\n ESM3 1365428525734967
4 GLOBEX 1548.00 555\n ESM3 1365428525735567
5 GLOBEX 1548.00 556\n ESM3 1365428525735585
6 GLOBEX 1548.00 556\n ESM3 1365428525736116
7 GLOBEX 1548.00 556\n ESM3 1365428525740757
8 GLOBEX 1548.00 556\n ESM3 1365428525748502
9 GLOBEX 1548.00 557\n ESM3 1365428525748952
(I can remove the \\n
from quantity
with an rstrip()
after I've brought it in.) (将
\\n
带入后,可以使用rstrip()
从quantity
删除\\n
。)
If you know the key names beforehand and if the names always appear in the same order, then you could use a converter to chop off the key names, and then use the names
parameter to name the columns: 如果您事先知道键名,并且名称始终以相同的顺序出现,则可以使用转换器将键名砍掉,然后使用
names
参数来命名列:
import pandas as pd
def value(item):
return item[item.find('=')+1:]
df = pd.read_table('data.txt', header=None, delimiter=',',
converters={i:value for i in range(5)},
names='symbol exchange timestamp price quantity'.split())
print(df)
on your posted data yields 您发布的数据收益
symbol exchange timestamp price quantity
0 ESM3 GLOBEX 1365428525690751 1548.00 551
1 ESM3 GLOBEX 1365428525697183 1548.00 551
2 ESM3 GLOBEX 1365428525714498 1548.00 551
3 ESM3 GLOBEX 1365428525734967 1548.00 551
4 ESM3 GLOBEX 1365428525735567 1548.00 555
5 ESM3 GLOBEX 1365428525735585 1548.00 556
6 ESM3 GLOBEX 1365428525736116 1548.00 556
7 ESM3 GLOBEX 1365428525740757 1548.00 556
8 ESM3 GLOBEX 1365428525748502 1548.00 556
9 ESM3 GLOBEX 1365428525748952 1548.00 557
I'm not sure what the best way to do this is, but assuming that the delimiters aren't found in the values -- it hurts my brain to think of the corner cases -- then something like this isn't super-elegant but is straightforward: 我不确定执行此操作的最佳方法是什么,但是假设在值中未找到定界符-考虑到极端情况会伤及我的大脑-那么类似的事情并不是超级优雅但很简单:
>>> df = pd.read_csv("esm.csv", sep=",|=", header=None)
>>> df2 = df.ix[:,1::2]
>>> df2.columns = list(df.ix[0,0::2])
>>> df2
symbol exchange timestamp price quantity
0 ESM3 GLOBEX 1365428525690751 1548 551
1 ESM3 GLOBEX 1365428525697183 1548 551
2 ESM3 GLOBEX 1365428525714498 1548 551
3 ESM3 GLOBEX 1365428525734967 1548 551
4 ESM3 GLOBEX 1365428525735567 1548 555
5 ESM3 GLOBEX 1365428525735585 1548 556
6 ESM3 GLOBEX 1365428525736116 1548 556
7 ESM3 GLOBEX 1365428525740757 1548 556
8 ESM3 GLOBEX 1365428525748502 1548 556
9 ESM3 GLOBEX 1365428525748952 1548 557
Basically, read it in, and then do the pivot yourself, keeping every other element and then fixing the column names. 基本上,请先阅读它,然后自己进行数据透视,保留所有其他元素,然后固定列名。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.