Pythonic将字典转换为numpy数组的方法

Question

This is more of a question about programming style. 这更像是关于编程风格的问题。 I scrap webpages for fields such as: "Temperature: 51 - 62", "Height: 1000-1500"...etc The results are saved in a dictionary 我删除了以下字段的网页：“温度：51 - 62”，“高度：1000-1500”......等结果保存在字典中

{"temperature": "51-62", "height":"1000-1500" ...... }

All key and values are string type. 所有键和值都是字符串类型。 Every key can map to one of many possible values. 每个键都可以映射到许多可能值中的一个。 Now I want to convert this dictionary to numpy array/vector. 现在我想将这个字典转换为numpy数组/向量。 I have the following concerns: 我有以下问题：

Each key corresponds to one index position in the array. 每个键对应于数组中的一个索引位置。
Each possible string value is mapped to one integer. 每个可能的字符串值都映射到一个整数。
For some dictionary, some keys are not available. 对于某些字典，某些键不可用。 For example, I also have a dictionary that has no "temperature" key, because that webpage doesn't contain such field. 例如，我也有一个没有“温度”键的字典，因为该网页不包含这样的字段。

I am wondering what is the most clear and efficient way of write such a conversion in Python. 我想知道在Python中编写这种转换的最清晰有效的方法是什么。 I am thinking of building another dictionary maps the key to the index number of the vector. 我正在考虑构建另一个字典，将关键字映射到向量的索引号。 And many other dictionaries that maps the values to integers. 还有许多其他字典将值映射到整数。

Another problem I am having is I am not sure about the range of some keys. 我遇到的另一个问题是我不确定某些键的范围。 I want to dynamically keep track of the mapping between string values and integers. 我想动态跟踪字符串值和整数之间的映射。 For example, I may find that key1 can map to a val1_8 in the future. 例如，我可能会发现key1将来可以映射到val1_8。

Thanks 谢谢

Answer 1

Try a pandas Series, it was built for this. 尝试一个熊猫系列，它是为此而建的。

import pandas as pd
s = pd.Series({'a':1, 'b':2, 'c':3})
s.values # a numpy array

Answer 2

>>> # a sequence of dictionaries in an interable called 'data'
>>> # assuming that not all dicts have the same keys
>>> pprint(data)
  [{'x': 7.0, 'y1': 2.773, 'y2': 4.5, 'y3': 2.0},
   {'x': 0.081, 'y1': 1.171, 'y2': 4.44, 'y3': 2.576},
   {'y1': 0.671, 'y3': 3.173},
   {'x': 0.242, 'y2': 3.978, 'y3': 3.791},
   {'x': 0.323, 'y1': 2.088, 'y2': 3.602, 'y3': 4.43}]

>>> # get the unique keys across entire dataset
>>> keys = [list(dx.keys()) for dx in data]

>>> # flatten and coerce to 'set'
>>> keys = {itm for inner_list in keys for itm in inner_list}

>>> # create a map (look-up table) from each key 
>>> # to a column in a NumPy array

>>> LuT = dict(enumerate(keys))
>>> LuT
  {'y2': 0, 'y3': 1, 'y1': 2, 'x': 3}

>>> idx = list(LuT.values())

>>> # pre-allocate NUmPy array (100 rows is arbitrary)
>>> # number of columns is len(LuT.keys())

>>> D = NP.empty((100, len(LuT.keys())))

>>> keys = list(LuT.keys())
>>> keys
  [0, 1, 2, 3]

>>> # now populate the array from the original data using LuT
>>> for i, row in enumerate(data):
        D[i,:] = [ row.get(LuT[k], 0) for k in keys ]

>> D[:5,:]
  array([[ 4.5  ,  2.   ,  2.773,  7.   ],
         [ 4.44 ,  2.576,  1.171,  0.081],
         [ 0.   ,  3.173,  0.671,  0.   ],
         [ 3.978,  3.791,  0.   ,  0.242],
         [ 3.602,  4.43 ,  2.088,  0.323]])

compare the last result (first 5 rows of D) with data , above 将上一个结果（D的前5行）与上面的数据进行比较

note that the ordering is preserved for each row (a single dictionary) with a less-than-complete set of keys--in other words, column 2 of D always corresponds to the values keyed to y2, , etc., even if the given row in data has no values stored for that key; 请注意，对于每一行（单个字典），使用一组不完整的键保留排序 - 换句话说， D的第2列始终对应于键入y2的值 ，等等，即使数据中的给定行没有为该键存储的值; eg, look at the third row in data, which has only two key/value pairs, in the third row of D, the first and last column are both 0 , these columns correspond to keys x and y2 , which are in fact the two missing keys 例如，查看数据中的第三行，其中只有两个键/值对，在D的第三行中，第一列和最后一列都是0 ，这些列对应于键x和y2 ，实际上是两列缺少钥匙

Pythonic将字典转换为numpy数组的方法

问题描述

2 个解决方案

解决方案1
7 已采纳 2014-05-14 23:57:34

解决方案2
1 2014-05-15 03:16:32

Pythonic将字典转换为numpy数组的方法

问题描述

2 个解决方案

解决方案1 7 已采纳 2014-05-14 23:57:34

解决方案2 1 2014-05-15 03:16:32

解决方案1
7 已采纳 2014-05-14 23:57:34

解决方案2
1 2014-05-15 03:16:32