简体   繁体   中英

Python Add Column to Pandas Dataframe That is a Count of List Elements in Another Column

I am extracting data from a MongoDB database using the pymongo API and inserting it into a pandas dataframe. Some fields in the database contain lists of diagnosis codes. Most of these have an accompanying "count" field, but one does not. This count will be very important for the analytics I plan to regularly perform on this data. the dataframe "DF" looks like this:

                                        dxCodes   memberID  newDx
0          [4280, 4293, 4241, 4240, 4242, 4243]  856589080      0
1                                       [V7612]  906903383      0
2                           [4550, 4553, V1582]  837210554      0
3       [78791, 28860, V1582, 496, 25000, 4019]  935634391      0
4  [30500, 42731, 4280, 496, 59972, 4019, 3051]  929185103      0

I need to create a new column in the that contains a count of the diagnosis codes contained in the diagnosis code field. I have been all over the internet but none of the solutions I have tried have been successful. The closest I have gotten has been this

DF['dxCount'] = len(DF['dxCodes'])

However, I get this as a result

                                        dxCodes   memberID  newDx  dxCount
0          [4280, 4293, 4241, 4240, 4242, 4243]  856589080      0   139360
1                                       [V7612]  906903383      0   139360
2                           [4550, 4553, V1582]  837210554      0   139360
3       [78791, 28860, V1582, 496, 25000, 4019]  935634391      0   139360
4  [30500, 42731, 4280, 496, 59972, 4019, 3051]  929185103      0   139360

The number that shows up in the dxCount column is the number of rows in the dataframe, but I want it to show the number of dx codes in the dxCodes field so the desired result would be this

                                        dxCodes   memberID  newDx  dxCount  
0          [4280, 4293, 4241, 4240, 4242, 4243]  856589080      0   6
1                                       [V7612]  906903383      0   1
2                           [4550, 4553, V1582]  837210554      0   3
3       [78791, 28860, V1582, 496, 25000, 4019]  935634391      0   6
4  [30500, 42731, 4280, 496, 59972, 4019, 3051]  929185103      0   7

I have come a long way on my Python journey, but this one has had me banging my head against the wall for several hours across multiple days. Thanks in advance for your assistance!

A list comprehension should work here:

>>> df['dxCount'] = [len(c) for c in df['dxCodes']]

Though perhaps a better design would be to keep the dxCodes in a separate dataframe indexed by memberID so that they could be stored as a homogenous column of strings.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM