简体   繁体   中英

How to convert horizontal dataframe structure to vertical with Pandas

Hello I have a problem similar to this one but in reverse. I need an idea how to write the dataframe vertically using the first column id as key.

So to start, an example of my input dataframe looks like this:

>>> df = pd.DataFrame({'id':[1,2,3,4,5], 'tag': ['a','b','c','d','e'], 'tag2': ['f','g','h','i','j'], 'tag3': ['k','l','m','','']})
>>> df
   id tag tag2 tag3
0   1   a    f    k
1   2   b    g    l
2   3   c    h    m
3   4   d    i
4   5   e    j

My desired output should be like this:

>>> df
    id tag
0    1   a
1    1   f
2    1   k
3    2   b
4    2   g
5    2   l
6    3   c
7    3   h
8    3   m
9    4   d
10   4   i
11   5   e
12   5   j

It looks like I have to use the entries of the id column as a key to my dictionary right? Like a default_dict(list) :

{1:['a','k','l'], 2:['b','g','l'], 3:['c','h','m'], 4:['d','i'], 5:['e','j']}

I just have trouble placing all column values per row into the dictionary as list, I already know how to make a dictionary if using two(2) columns only like :

some_dict = dict(zip(df['col1'],df['col2']))

But not as list as above.

Also, if there's a pandas solution to this that would be most ideal.

Since if I figure out how to create the dictionary with key values = list, I plan to loop it to change the format and create the desired DataFrame, and looping is not always advisable especially when working with large DataFrames.

Any help would be appreciated. Cheers!

Edit

Just figured out how to create a dictionary with list as values:

>>> x = df.set_index('id').T.to_dict('list')
>>> x
{1: ['a', 'f', 'k'], 2: ['b', 'g', 'l'], 3: ['c', 'h', 'm'], 4: ['d', 'i', ''], 5: ['e', 'j', '']}

So my problem now is how to utilize this dictionary to create a new dataframe as the desired output.

Thanks.

(df.melt(id_vars=["id"],value_vars =["tag",'tag2','tag3'],value_name="tag")
 .drop('variable',axis=1)
 .replace('', np.nan,)
 .dropna()
 .sort_values('id')
)

Try this:

df.replace('', np.nan).set_index('id').stack().reset_index(name='tag').drop('level_1',1)

Out[100]:
    id tag
0    1   a
1    1   f
2    1   k
3    2   b
4    2   g
5    2   l
6    3   c
7    3   h
8    3   m
9    4   d
10   4   i
11   5   e
12   5   j

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM