I have this data below, which is a list with 4 elements. These elements are tuple which items are list them self...
data = [(['a', 'b', 'c'],
[1, 2, 3, 4, 5],
['aa', 'bb'],
['00', '03', '0000', '0006']),
(['e', 'f', 'g'],
[2, 1, 4, 4, 6],
['qq', 'er'],
['10', '04', '3340', '9009']),
(['w', 'd', 'c'],
[5, 6, 55, 1, 6],
['rr', 'rr'],
['55', '11', '6788', '7789']),
(['l', 'a', 's'],
[29, 2, 9, 4, 3],
['yy', 'uu'],
['33', '67', '0000', '0237'])]
I want to convert it to dataframe in such a way that each element is broken onto column of the dataframe. For example; df = pd.DataFrame(data)
will result into a dataframe with four columns. What I want is for each column to be broken into columns of the dataframe as seen below in red lines...
That is to say, above dataframe will have each column sub divided into the number of items that made up the cell.
You can flatten nested list
s:
df = pd.DataFrame([[item for sublist in l for item in sublist] for l in data])
print (df)
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 a b c 1 2 3 4 5 aa bb 00 03 0000 0006
1 e f g 2 1 4 4 6 qq er 10 04 3340 9009
2 w d c 5 6 55 1 6 rr rr 55 11 6788 7789
3 l a s 29 2 9 4 3 yy uu 33 67 0000 0237
Timings :
data = data * 100
In [128]: %timeit pd.DataFrame([[item for sublist in l for item in sublist] for l in data])
100 loops, best of 3: 2.03 ms per loop
#cᴏʟᴅsᴘᴇᴇᴅ1
In [137]: %timeit pd.DataFrame(list(map(lambda d: list(chain.from_iterable(d)), data)))
1000 loops, best of 3: 1.97 ms per loop
#cᴏʟᴅsᴘᴇᴇᴅ2
In [129]: %timeit pd.DataFrame(np.concatenate(list(zip(*data)), axis=1))
1000 loops, best of 3: 1.46 ms per loop
#cᴏʟᴅsᴘᴇᴇᴅ3
In [130]: %timeit pd.DataFrame([np.concatenate(d) for d in data])
100 loops, best of 3: 5.9 ms per loop
data = data * 10000
In [121]: %timeit pd.DataFrame([[item for sublist in l for item in sublist] for l in data])
10 loops, best of 3: 99.2 ms per loop
#cᴏʟᴅsᴘᴇᴇᴅ1
In [139]: %timeit pd.DataFrame(list(map(lambda d: list(chain.from_iterable(d)), data)))
10 loops, best of 3: 95.8 ms per loop
#cᴏʟᴅsᴘᴇᴇᴅ2
In [122]: %timeit pd.DataFrame(np.concatenate(list(zip(*data)), axis=1))
10 loops, best of 3: 150 ms per loop
#cᴏʟᴅsᴘᴇᴇᴅ3
In [123]: %timeit pd.DataFrame([np.concatenate(d) for d in data])
1 loop, best of 3: 560 ms per loop
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.