简体   繁体   中英

manipulating a pandas dataframe column containing a list

I have used the following code with the unique() function in pandas to create a column which then contains a list of unique values:

import pandas as pd
from collections import OrderedDict
dct = OrderedDict([
('referencenum',['10','10','20','20','20','30','30','40']),
('Month',['Jan','Jan','Jan','Feb','Feb','Feb','Feb','Mar']),
('Category',['good','bad','bad','bad','bad','good','bad','bad'])
                 ])
df = pd.DataFrame.from_dict(dct)

This gives the following sample dataset:

  referencenum Month Category
0           10   Jan     good
1           10   Jan      bad
2           20   Jan      bad
3           20   Feb      bad
4           20   Feb      bad
5           30   Feb     good
6           30   Feb      bad
7           40   Mar      bad

Then I summarise as follows:

dfsummary = pd.DataFrame(df.groupby(['referencenum', 'Month'])['Category'].unique())
dfsummary.reset_index()

To give the summary dataframe with "Category" column containing a list

referencenum    Month         Category
0   10          Jan           [good, bad]
1   20          Feb           [bad]
2   20          Jan           [bad]
3   30          Feb           [good, bad]
4   40          Mar           [bad]

My question is how do I obtain another column containing the len() or number of items in the Category "list" column?

Also - how do extract the first/ second item in the list to another column?

Can I do these manipulations within pandas or do I somehow need to drop out to list manipulations and then come back to pandas?

Many thanks!

You should check out the accessors .

Basically, they're ways to handle the values contained in a Series that are specific to their type (datetime, string, etc.).

In this case, you would use df['Category'].str.len() .

If you wanted the first element, you would use df['Category'].str[0] .

To generalise: you can treat the elements of a Series as a collection of objects by referring to its .str property.

如果要获取Category列中每个条目的元素数,则应将len()方法与apply()

dfsummary['Category_len'] = dfsummary['Category'].apply(len)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM