[英]Building a dictionary of words from multiple lists in python
我有一个100分的词典列表,如下所示:
datapoint1 a:1 b:2 c:6
datapoint2 a:2 d:8 p:10
.....
datapoint100: c:9 d:1 z:12
我想将列表打印到文件,如下所示:
a b c d ...... z
datapoint1 1 2 6 0 ...... 0
datapoint2 2 0 0 8 ...... 0
.........
.........
datapoint100 0 0 9 1 ...... 12
这里提到a,b,c ... z只是例如事先不知道单词的实际数量,因此单词的总数不是26,它可以是1000/10000以及a,b,...将会被诸如“ my”,“ hi”,“ tote” ...等真实的单词代替。
我一直在考虑尝试执行以下操作:
但是此方法在python中似乎很复杂。 有没有在python中做的更好的方法?
如果您不太在乎列对齐的技巧,那还不错:
datapoints = [{'a': 1, 'b': 2, 'c': 6},
{'a': 2, 'd': 8, 'p': 10},
{'c': 9, 'd': 1, 'z': 12}]
# get all the keys ever seen
keys = sorted(set.union(*(set(dp) for dp in datapoints)))
with open("outfile.txt", "wb") as fp:
# write the header
fp.write("{}\n".format(' '.join([" "] + keys)))
# loop over each point, getting the values in order (or 0 if they're absent)
for i, datapoint in enumerate(datapoints):
out = '{} {}\n'.format(i, ' '.join(str(datapoint.get(k, 0)) for k in keys))
fp.write(out)
产生
a b c d p z
0 1 2 6 0 0 0
1 2 0 0 8 10 0
2 0 0 9 1 0 12
如评论中所述, pandas解决方案也非常不错:
>>> import pandas as pd
>>> df = pd.DataFrame(datapoints).fillna(0).astype(int)
>>> df
a b c d p z
0 1 2 6 0 0 0
1 2 0 0 8 10 0
2 0 0 9 1 0 12
>>> df.to_csv("outfile_pd.csv", sep=" ")
>>> !cat outfile_pd.csv
a b c d p z
0 1 2 6 0 0 0
1 2 0 0 8 10 0
2 0 0 9 1 0 12
如果您确实需要使各列很好地对齐,那么也有一些方法可以做到这一点,但是我从不需要它们,因此我对它们了解不多。
程序:
data_points = [
{'a': 1, 'b': 2, 'c': 6},
{'a': 2, 'd': 8, 'p': 10},
{'c': 9, 'd': 1, 'z': 12},
{'e': 3, 'f': 6, 'g': 3}
]
merged_data_points = {
}
for data_point in data_points:
for k, v in data_point.items():
if k not in merged_data_points:
merged_data_points[k] = []
merged_data_points[k].append(v)
# print the merged datapoints
print '{'
for k in merged_data_points:
print ' {0}: {1},'.format(k, merged_data_points[k])
print '}'
输出:
{
a: [1, 2],
c: [6, 9],
b: [2],
e: [3],
d: [8, 1],
g: [3],
f: [6],
p: [10],
z: [12],
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.