将键值对读入Pandas

Question

Pandas makes it really easy to read a CSV file: Pandas使得读取CSV文件非常容易：

pd.read_table('data.txt', sep=',')

Does Pandas having something similar for a file with key-value pairs? 对于具有键值对的文件，Pandas是否具有类似的功能？ I came-up with this: 我想出了这个：

pd.DataFrame([dict([p.split('=') for p in l.split(',')]) for l in open('data.txt')])

If not built-in, then perhaps something more idiomatic? 如果不是内置的，那么也许更惯用了吗？

The file of interest looks like this: 感兴趣的文件如下所示：

symbol=ESM3,exchange=GLOBEX,timestamp=1365428525690751,price=1548.00,quantity=551
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525697183,price=1548.00,quantity=551
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525714498,price=1548.00,quantity=551
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525734967,price=1548.00,quantity=551
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525735567,price=1548.00,quantity=555
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525735585,price=1548.00,quantity=556
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525736116,price=1548.00,quantity=556
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525740757,price=1548.00,quantity=556
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525748502,price=1548.00,quantity=556
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525748952,price=1548.00,quantity=557

It has the exact same keys on every line, and in the same order. 它在每一行上具有完全相同的键，并且顺序相同。 There are no null values. 没有空值。 The table to be generated is: 要生成的表是：

  exchange    price quantity symbol         timestamp
0   GLOBEX  1548.00    551\n   ESM3  1365428525690751
1   GLOBEX  1548.00    551\n   ESM3  1365428525697183
2   GLOBEX  1548.00    551\n   ESM3  1365428525714498
3   GLOBEX  1548.00    551\n   ESM3  1365428525734967
4   GLOBEX  1548.00    555\n   ESM3  1365428525735567
5   GLOBEX  1548.00    556\n   ESM3  1365428525735585
6   GLOBEX  1548.00    556\n   ESM3  1365428525736116
7   GLOBEX  1548.00    556\n   ESM3  1365428525740757
8   GLOBEX  1548.00    556\n   ESM3  1365428525748502
9   GLOBEX  1548.00    557\n   ESM3  1365428525748952

(I can remove the \\n from quantity with an rstrip() after I've brought it in.) （将\\n带入后，可以使用rstrip()从quantity删除\\n 。）

Answer 1

If you know the key names beforehand and if the names always appear in the same order, then you could use a converter to chop off the key names, and then use the names parameter to name the columns: 如果您事先知道键名，并且名称始终以相同的顺序出现，则可以使用转换器将键名砍掉，然后使用names参数来命名列：

import pandas as pd

def value(item):
    return item[item.find('=')+1:]

df = pd.read_table('data.txt', header=None, delimiter=',',
                   converters={i:value for i in range(5)},
                   names='symbol exchange timestamp price quantity'.split())
print(df)

on your posted data yields 您发布的数据收益

  symbol exchange         timestamp    price quantity
0   ESM3   GLOBEX  1365428525690751  1548.00      551
1   ESM3   GLOBEX  1365428525697183  1548.00      551
2   ESM3   GLOBEX  1365428525714498  1548.00      551
3   ESM3   GLOBEX  1365428525734967  1548.00      551
4   ESM3   GLOBEX  1365428525735567  1548.00      555
5   ESM3   GLOBEX  1365428525735585  1548.00      556
6   ESM3   GLOBEX  1365428525736116  1548.00      556
7   ESM3   GLOBEX  1365428525740757  1548.00      556
8   ESM3   GLOBEX  1365428525748502  1548.00      556
9   ESM3   GLOBEX  1365428525748952  1548.00      557

Answer 2

I'm not sure what the best way to do this is, but assuming that the delimiters aren't found in the values -- it hurts my brain to think of the corner cases -- then something like this isn't super-elegant but is straightforward: 我不确定执行此操作的最佳方法是什么，但是假设在值中未找到定界符-考虑到极端情况会伤及我的大脑-那么类似的事情并不是超级优雅但很简单：

>>> df = pd.read_csv("esm.csv", sep=",|=", header=None)
>>> df2 = df.ix[:,1::2]
>>> df2.columns = list(df.ix[0,0::2])
>>> df2
  symbol exchange         timestamp  price  quantity
0   ESM3   GLOBEX  1365428525690751   1548       551
1   ESM3   GLOBEX  1365428525697183   1548       551
2   ESM3   GLOBEX  1365428525714498   1548       551
3   ESM3   GLOBEX  1365428525734967   1548       551
4   ESM3   GLOBEX  1365428525735567   1548       555
5   ESM3   GLOBEX  1365428525735585   1548       556
6   ESM3   GLOBEX  1365428525736116   1548       556
7   ESM3   GLOBEX  1365428525740757   1548       556
8   ESM3   GLOBEX  1365428525748502   1548       556
9   ESM3   GLOBEX  1365428525748952   1548       557

Basically, read it in, and then do the pivot yourself, keeping every other element and then fixing the column names. 基本上，请先阅读它，然后自己进行数据透视，保留所有其他元素，然后固定列名。

将键值对读入Pandas

问题描述

2 个解决方案

解决方案1
4 已采纳 2013-04-09 17:14:39

解决方案2
2 2013-04-09 17:23:42

将键值对读入Pandas

问题描述

2 个解决方案

解决方案1 4 已采纳 2013-04-09 17:14:39

解决方案2 2 2013-04-09 17:23:42

解决方案1
4 已采纳 2013-04-09 17:14:39

解决方案2
2 2013-04-09 17:23:42