I was wondering if it was possible to create a dataframe from a list of lists, where each item in the index_list is attached as an index to each value in lst:
index_list = ['phase1', 'phase2', 'phase3']
lst = [['a', 'b', 'c'], ['d', 'e', 'f', 'g'], ['h', 'i', 'j']]
Thank you for any help!!
Edit: the inner lists are not necessarily the same size.
You can use pd.Series.explode
here.
pd.Series(lst,index=index_list).explode()
phase1 a
phase1 b
phase1 c
phase2 d
phase2 e
phase2 f
phase2 g
phase3 h
phase3 i
phase3 j
dtype: object
Another solution using np.repeat
and np.concatenate
r_len = [len(r) for r in lst]
pd.Series(np.concatenate(lst), index=np.repeat(index_list,r_len))
phase1 a
phase1 b
phase1 c
phase2 d
phase2 e
phase2 f
phase2 g
phase3 h
phase3 i
phase3 j
dtype: object
Timeit results:
In [501]: %%timeit
...: pd.Series(lst,index=index_list).explode()
...:
...:
363 µs ± 16.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [503]: %%timeit
...: r_len = [len(r) for r in lst]
...: pd.Series(np.concatenate(lst), index=np.repeat(index_list,r_len))
...:
...:
236 µs ± 17.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
This problem looks similar to R's expand.grid()
function and is listed in this pandas cookbook (bottom of the page). This function lets you to create dataframe with all combinations of the given input values.
First define a function:
def expand_grid(data_dict):
rows = itertools.product(*data_dict.values())
return pd.DataFrame.from_records(rows, columns=data_dict.keys())
Then you can use it like so:
df = expand_grid({'index': ['phase1', 'phase2', 'phase3'],
'Col1': [['a', 'b', 'c'], ['d', 'e', 'f', 'g'], ['h', 'i', 'j']]})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.