简体   繁体   English

从python中的多个列表构建单词词典

[英]Building a dictionary of words from multiple lists in python

I have a list of dictionaries of 100 points as follows: 我有一个100分的词典列表,如下所示:

datapoint1 a:1 b:2 c:6
datapoint2 a:2 d:8 p:10
.....
datapoint100: c:9 d:1 z:12

I want to print a list to a file as follows: 我想将列表打印到文件,如下所示:

           a b c d ...... z
datapoint1 1 2 6 0 ...... 0
datapoint2 2 0 0 8 ...... 0
.........
.........
datapoint100 0 0 9 1 ...... 12

Here to mention a,b,c...z are just for example the real number of words are not known beforehand, so the total number of words is not 26, it can be 1000/ 10000 and a, b, .... will be replaced with real words like 'my', 'hi', 'tote' ... etc. 这里提到a,b,c ... z只是例如事先不知道单词的实际数量,因此单词的总数不是26,它可以是1000/10000以及a,b,...将会被诸如“ my”,“ hi”,“ tote” ...等真实的单词代替。

I have been thinking of trying to do it as follows: 我一直在考虑尝试执行以下操作:

  1. build a dictionary of words lets call it global dictionary 建立单词词典,我们称之为全局词典
  2. then build a list of dictionaries where each dictionary represents a data point 然后建立字典列表,其中每个字典代表一个数据点
  3. then trying to map the list of dictionaries to the global dictionaries 然后尝试将字典列表映射到全局字典

But this method seems complicated in python. 但是此方法在python中似乎很复杂。 Is there any better way of doing it in python? 有没有在python中做的更好的方法?

If you don't care much about the fiddly bits of column alignment, this isn't too bad: 如果您不太在乎列对齐的技巧,那还不错:

datapoints = [{'a': 1, 'b': 2, 'c': 6},
              {'a': 2, 'd': 8, 'p': 10},
              {'c': 9, 'd': 1, 'z': 12}]

# get all the keys ever seen
keys = sorted(set.union(*(set(dp) for dp in datapoints)))

with open("outfile.txt", "wb") as fp:
    # write the header
    fp.write("{}\n".format(' '.join([" "] + keys)))
    # loop over each point, getting the values in order (or 0 if they're absent)
    for i, datapoint in enumerate(datapoints):
        out = '{} {}\n'.format(i, ' '.join(str(datapoint.get(k, 0)) for k in keys))
        fp.write(out)

produces 产生

  a b c d p z
0 1 2 6 0 0 0
1 2 0 0 8 10 0
2 0 0 9 1 0 12

As mentioned in the comments, the pandas solution is pretty nice too: 如评论中所述, pandas解决方案也非常不错:

>>> import pandas as pd
>>> df = pd.DataFrame(datapoints).fillna(0).astype(int)
>>> df
   a  b  c  d   p   z
0  1  2  6  0   0   0
1  2  0  0  8  10   0
2  0  0  9  1   0  12
>>> df.to_csv("outfile_pd.csv", sep=" ")
>>> !cat outfile_pd.csv
 a b c d p z
0 1 2 6 0 0 0
1 2 0 0 8 10 0
2 0 0 9 1 0 12

If you really need the columns nicely aligned, then there are ways to do that too, but I never need them so I don't know much about them. 如果您确实需要使各列很好地对齐,那么也有一些方法可以做到这一点,但是我从不需要它们,因此我对它们了解不多。

Program: 程序:

data_points = [
    {'a': 1, 'b': 2, 'c': 6},
    {'a': 2, 'd': 8, 'p': 10},
    {'c': 9, 'd': 1, 'z': 12},
    {'e': 3, 'f': 6, 'g': 3}
]

merged_data_points = {
}

for data_point in data_points:
    for k, v in data_point.items():
        if k not in merged_data_points:
            merged_data_points[k] = []
        merged_data_points[k].append(v)

# print the merged datapoints
print '{'
for k in merged_data_points:
    print '  {0}: {1},'.format(k, merged_data_points[k])
print '}'

Output: 输出:

{
  a: [1, 2],
  c: [6, 9],
  b: [2],
  e: [3],
  d: [8, 1],
  g: [3],
  f: [6],
  p: [10],
  z: [12],
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM