简体   繁体   中英

Efficient Pandas Dataframe insert

I'm trying to add float values like [[(1,0.44),(2,0.5),(3,0.1)],[(2,0.63),(1,0.85),(3,0.11)],[...]] to a Pandas dataframe which looks like a matrix build from the first value of the tuples

df = 1 2 3 1 0.44 0.5 0.1 2 0.85 0.63 0.11 3 ... ... ...

I tried this:

    for key, value in enumerate(outer_list):
      for tuplevalue in value:
        df.ix[key][tuplevalue[0]] = tuplevalue[1]

The Problem is that my NxN-Matrix contains about 10000x10000 elements and hence it takes really long with my approach. Is there another possibility to speed this up?

(Unfortunately the values in the list are not ordered by the first tuple element)

Use list comprehensions to first sort and extract your data. Then create your dataframe from the sorted and cleaned data.

data = [[(1, 0.44), (2, 0.50), (3, 0.10)],
        [(2, 0.63), (1, 0.85), (3, 0.11)]]

# First, sort each row.
_ = [row.sort() for row in data]

# Then extract the second element of each tuple.
new_data = [[t[1] for t in row] for row in data]

# Now create a dataframe from your data.
>>> pd.DataFrame(new_data)
      0     1     2
0  0.44  0.50  0.10
1  0.85  0.63  0.11

This works using a dictionary (if you need to preserve your column order, or if the column names were a string). Maybe Alexander will update his answer to account for that, I'm nearly certain he'll have a better solution than my proposed one :)

Here's an example:

from collections import defaultdict

a = [[(1,0.44),(2,0.5),(3,0.1)],[(2,0.63),(1,0.85),(3,0.11)]]
b = [[('A',0.44),('B',0.5),('C',0.1)],[('B',0.63),('A',0.85),('C',0.11)]]

First on a:

row_to_dic = [{str(y[0]): y[1] for y in x} for x in a]

dd = defaultdict(list)
for d in (row_to_dic):
    for key, value in d.iteritems():
        dd[key].append(value)

pd.DataFrame.from_dict(dd)

    1   2   3
0   0.44    0.50    0.10
1   0.85    0.63    0.11

and b:

row_to_dic = [{str(y[0]): y[1] for y in x} for x in b]

dd = defaultdict(list)
for d in (row_to_dic):
    for key, value in d.iteritems():
        dd[key].append(value)

pd.DataFrame.from_dict(dd)
      A     B   C
0   0.44    0.50    0.10
1   0.85    0.63    0.11

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM