Efficient Pandas Dataframe insert

Question

I'm trying to add float values like [[(1,0.44),(2,0.5),(3,0.1)],[(2,0.63),(1,0.85),(3,0.11)],[...]] to a Pandas dataframe which looks like a matrix build from the first value of the tuples

df = 1 2 3 1 0.44 0.5 0.1 2 0.85 0.63 0.11 3 ... ... ...

I tried this:

    for key, value in enumerate(outer_list):
      for tuplevalue in value:
        df.ix[key][tuplevalue[0]] = tuplevalue[1]

The Problem is that my NxN-Matrix contains about 10000x10000 elements and hence it takes really long with my approach. Is there another possibility to speed this up?

(Unfortunately the values in the list are not ordered by the first tuple element)

Answer 1

Use list comprehensions to first sort and extract your data. Then create your dataframe from the sorted and cleaned data.

data = [[(1, 0.44), (2, 0.50), (3, 0.10)],
        [(2, 0.63), (1, 0.85), (3, 0.11)]]

# First, sort each row.
_ = [row.sort() for row in data]

# Then extract the second element of each tuple.
new_data = [[t[1] for t in row] for row in data]

# Now create a dataframe from your data.
>>> pd.DataFrame(new_data)
      0     1     2
0  0.44  0.50  0.10
1  0.85  0.63  0.11

Answer 2

This works using a dictionary (if you need to preserve your column order, or if the column names were a string). Maybe Alexander will update his answer to account for that, I'm nearly certain he'll have a better solution than my proposed one :)

Here's an example:

from collections import defaultdict

a = [[(1,0.44),(2,0.5),(3,0.1)],[(2,0.63),(1,0.85),(3,0.11)]]
b = [[('A',0.44),('B',0.5),('C',0.1)],[('B',0.63),('A',0.85),('C',0.11)]]

First on a:

row_to_dic = [{str(y[0]): y[1] for y in x} for x in a]

dd = defaultdict(list)
for d in (row_to_dic):
    for key, value in d.iteritems():
        dd[key].append(value)

pd.DataFrame.from_dict(dd)

    1   2   3
0   0.44    0.50    0.10
1   0.85    0.63    0.11

and b:

row_to_dic = [{str(y[0]): y[1] for y in x} for x in b]

dd = defaultdict(list)
for d in (row_to_dic):
    for key, value in d.iteritems():
        dd[key].append(value)

pd.DataFrame.from_dict(dd)
      A     B   C
0   0.44    0.50    0.10
1   0.85    0.63    0.11

Efficient Pandas Dataframe insert

Question

2 answers

solution1
2 ACCPTED 2016-02-16 15:34:19

solution2
1 2016-02-16 16:01:38

Efficient Pandas Dataframe insert

Question

2 answers

solution1 2 ACCPTED 2016-02-16 15:34:19

solution2 1 2016-02-16 16:01:38

solution1
2 ACCPTED 2016-02-16 15:34:19

solution2
1 2016-02-16 16:01:38