简体   繁体   中英

Create pandas dataframe from list of tuple of nested lists

I have this data below, which is a list with 4 elements. These elements are tuple which items are list them self...

data = [(['a', 'b', 'c'],
  [1, 2, 3, 4, 5],
  ['aa', 'bb'],
  ['00', '03', '0000', '0006']),
 (['e', 'f', 'g'],
  [2, 1, 4, 4, 6],
  ['qq', 'er'],
  ['10', '04', '3340', '9009']),
 (['w', 'd', 'c'],
  [5, 6, 55, 1, 6],
  ['rr', 'rr'],
  ['55', '11', '6788', '7789']),
 (['l', 'a', 's'],
  [29, 2, 9, 4, 3],
  ['yy', 'uu'],
  ['33', '67', '0000', '0237'])]

I want to convert it to dataframe in such a way that each element is broken onto column of the dataframe. For example; df = pd.DataFrame(data)

will result into a dataframe with four columns. What I want is for each column to be broken into columns of the dataframe as seen below in red lines... 在此处输入图片说明

That is to say, above dataframe will have each column sub divided into the number of items that made up the cell.

You can flatten nested list s:

df = pd.DataFrame([[item for sublist in l for item in sublist] for l in data])
print (df)
  0  1  2   3   4   5   6   7   8   9   10  11    12    13
0  a  b  c   1   2   3   4   5  aa  bb  00  03  0000  0006
1  e  f  g   2   1   4   4   6  qq  er  10  04  3340  9009
2  w  d  c   5   6  55   1   6  rr  rr  55  11  6788  7789
3  l  a  s  29   2   9   4   3  yy  uu  33  67  0000  0237

Timings :

data = data * 100

In [128]: %timeit pd.DataFrame([[item for sublist in l for item in sublist] for l in data])
100 loops, best of 3: 2.03 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ1 
In [137]: %timeit pd.DataFrame(list(map(lambda d:  list(chain.from_iterable(d)), data)))
1000 loops, best of 3: 1.97 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ2 
In [129]: %timeit pd.DataFrame(np.concatenate(list(zip(*data)), axis=1))
1000 loops, best of 3: 1.46 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ3 
In [130]: %timeit pd.DataFrame([np.concatenate(d) for d in data])
100 loops, best of 3: 5.9 ms per loop


data = data * 10000

In [121]: %timeit pd.DataFrame([[item for sublist in l for item in sublist] for l in data])
10 loops, best of 3: 99.2 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ1 
In [139]: %timeit pd.DataFrame(list(map(lambda d: list(chain.from_iterable(d)), data)))
10 loops, best of 3: 95.8 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ2 
In [122]: %timeit pd.DataFrame(np.concatenate(list(zip(*data)), axis=1))
10 loops, best of 3: 150 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ3 
In [123]: %timeit pd.DataFrame([np.concatenate(d) for d in data])
1 loop, best of 3: 560 ms per loop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM