[英]Building a dictionary of words from multiple lists in python
我有一個100分的詞典列表,如下所示:
datapoint1 a:1 b:2 c:6
datapoint2 a:2 d:8 p:10
.....
datapoint100: c:9 d:1 z:12
我想將列表打印到文件,如下所示:
a b c d ...... z
datapoint1 1 2 6 0 ...... 0
datapoint2 2 0 0 8 ...... 0
.........
.........
datapoint100 0 0 9 1 ...... 12
這里提到a,b,c ... z只是例如事先不知道單詞的實際數量,因此單詞的總數不是26,它可以是1000/10000以及a,b,...將會被諸如“ my”,“ hi”,“ tote” ...等真實的單詞代替。
我一直在考慮嘗試執行以下操作:
但是此方法在python中似乎很復雜。 有沒有在python中做的更好的方法?
如果您不太在乎列對齊的技巧,那還不錯:
datapoints = [{'a': 1, 'b': 2, 'c': 6},
{'a': 2, 'd': 8, 'p': 10},
{'c': 9, 'd': 1, 'z': 12}]
# get all the keys ever seen
keys = sorted(set.union(*(set(dp) for dp in datapoints)))
with open("outfile.txt", "wb") as fp:
# write the header
fp.write("{}\n".format(' '.join([" "] + keys)))
# loop over each point, getting the values in order (or 0 if they're absent)
for i, datapoint in enumerate(datapoints):
out = '{} {}\n'.format(i, ' '.join(str(datapoint.get(k, 0)) for k in keys))
fp.write(out)
產生
a b c d p z
0 1 2 6 0 0 0
1 2 0 0 8 10 0
2 0 0 9 1 0 12
如評論中所述, pandas解決方案也非常不錯:
>>> import pandas as pd
>>> df = pd.DataFrame(datapoints).fillna(0).astype(int)
>>> df
a b c d p z
0 1 2 6 0 0 0
1 2 0 0 8 10 0
2 0 0 9 1 0 12
>>> df.to_csv("outfile_pd.csv", sep=" ")
>>> !cat outfile_pd.csv
a b c d p z
0 1 2 6 0 0 0
1 2 0 0 8 10 0
2 0 0 9 1 0 12
如果您確實需要使各列很好地對齊,那么也有一些方法可以做到這一點,但是我從不需要它們,因此我對它們了解不多。
程序:
data_points = [
{'a': 1, 'b': 2, 'c': 6},
{'a': 2, 'd': 8, 'p': 10},
{'c': 9, 'd': 1, 'z': 12},
{'e': 3, 'f': 6, 'g': 3}
]
merged_data_points = {
}
for data_point in data_points:
for k, v in data_point.items():
if k not in merged_data_points:
merged_data_points[k] = []
merged_data_points[k].append(v)
# print the merged datapoints
print '{'
for k in merged_data_points:
print ' {0}: {1},'.format(k, merged_data_points[k])
print '}'
輸出:
{
a: [1, 2],
c: [6, 9],
b: [2],
e: [3],
d: [8, 1],
g: [3],
f: [6],
p: [10],
z: [12],
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.