Building a dictionary of words from multiple lists in python

Question

I have a list of dictionaries of 100 points as follows:

datapoint1 a:1 b:2 c:6
datapoint2 a:2 d:8 p:10
.....
datapoint100: c:9 d:1 z:12

I want to print a list to a file as follows:

           a b c d ...... z
datapoint1 1 2 6 0 ...... 0
datapoint2 2 0 0 8 ...... 0
.........
.........
datapoint100 0 0 9 1 ...... 12

Here to mention a,b,c...z are just for example the real number of words are not known beforehand, so the total number of words is not 26, it can be 1000/ 10000 and a, b, .... will be replaced with real words like 'my', 'hi', 'tote' ... etc.

I have been thinking of trying to do it as follows:

build a dictionary of words lets call it global dictionary
then build a list of dictionaries where each dictionary represents a data point
then trying to map the list of dictionaries to the global dictionaries

But this method seems complicated in python. Is there any better way of doing it in python?

Answer 1

If you don't care much about the fiddly bits of column alignment, this isn't too bad:

datapoints = [{'a': 1, 'b': 2, 'c': 6},
              {'a': 2, 'd': 8, 'p': 10},
              {'c': 9, 'd': 1, 'z': 12}]

# get all the keys ever seen
keys = sorted(set.union(*(set(dp) for dp in datapoints)))

with open("outfile.txt", "wb") as fp:
    # write the header
    fp.write("{}\n".format(' '.join([" "] + keys)))
    # loop over each point, getting the values in order (or 0 if they're absent)
    for i, datapoint in enumerate(datapoints):
        out = '{} {}\n'.format(i, ' '.join(str(datapoint.get(k, 0)) for k in keys))
        fp.write(out)

produces

  a b c d p z
0 1 2 6 0 0 0
1 2 0 0 8 10 0
2 0 0 9 1 0 12

As mentioned in the comments, the pandas solution is pretty nice too:

>>> import pandas as pd
>>> df = pd.DataFrame(datapoints).fillna(0).astype(int)
>>> df
   a  b  c  d   p   z
0  1  2  6  0   0   0
1  2  0  0  8  10   0
2  0  0  9  1   0  12
>>> df.to_csv("outfile_pd.csv", sep=" ")
>>> !cat outfile_pd.csv
 a b c d p z
0 1 2 6 0 0 0
1 2 0 0 8 10 0
2 0 0 9 1 0 12

If you really need the columns nicely aligned, then there are ways to do that too, but I never need them so I don't know much about them.

Answer 2

Program:

data_points = [
    {'a': 1, 'b': 2, 'c': 6},
    {'a': 2, 'd': 8, 'p': 10},
    {'c': 9, 'd': 1, 'z': 12},
    {'e': 3, 'f': 6, 'g': 3}
]

merged_data_points = {
}

for data_point in data_points:
    for k, v in data_point.items():
        if k not in merged_data_points:
            merged_data_points[k] = []
        merged_data_points[k].append(v)

# print the merged datapoints
print '{'
for k in merged_data_points:
    print '  {0}: {1},'.format(k, merged_data_points[k])
print '}'

Output:

{
  a: [1, 2],
  c: [6, 9],
  b: [2],
  e: [3],
  d: [8, 1],
  g: [3],
  f: [6],
  p: [10],
  z: [12],
}

Building a dictionary of words from multiple lists in python

Question

2 answers

solution1
1 ACCPTED 2013-03-19 20:26:43

solution2
0 2013-03-19 20:31:18

Building a dictionary of words from multiple lists in python

Question

2 answers

solution1 1 ACCPTED 2013-03-19 20:26:43

solution2 0 2013-03-19 20:31:18

solution1
1 ACCPTED 2013-03-19 20:26:43

solution2
0 2013-03-19 20:31:18