简体   繁体   中英

Pandas python Random select values from a list colum by another column value

I have a data frame with one column with list values and another column with just 1 item in a list. I want to select out values from column id by a condition in column canceled then making another column C with the selected values.
Column canceled is the number of canceled codes. I need to change the canceled into int and them slice the I'd column with the number of the canceled then return a random number from the column I'd. Ie say code 11AS I will randomly pick 1 id from the array and create another row with canceled Id. For code 22AS since its 0, I will not slice any thing so I will not return any value in the newly created column, so this will go down to all rows.

code    canceled  id
xxx     [1.0]     [107385, 128281, 133015]
xxS     [0.0]     [108664, 110515, 113556]
ssD     [1.0]     [134798, 133499, 125396, 114298, 133915]
cvS     [0.0]     [107611]
eeS     [5.0]     [113472, 115236, 108586, 128043, 114106, 10796...
544W    [44.0]    [107650, 128014, 127763, 118036, 116247, 12802.

I tried to loop through and slice but i couldn't get what i want. Say px is my DataFrame.

for i in px['canceled']:
    print(px['id'].str.slice(stop=int(i[0])))

What about using apply in conjunction with random.sample as follows

import random

px['C'] = px.apply(
    lambda datum : random.sample(
        datum.id, k=int(datum.canceled[0])
    ),
    axis = 1
)

which may return (recalling that the column C is randomly generated)

code    canceled       id                                         C
xxS     [1.0]          [107385, 128281, 133015]                   [128281]
xxxxS   [0.0]          [108664, 110515, 113556]                   []
ssOD    [1.0]          [134798, 133499, 125396, 114298, 133915]   [114298]
45AS    [0.0]          [107611]                                   []
...     ...            ...                                        ...


If int(datum.canceled[0]) returns something greater than the length of datum.id , something you can do is returning datum.id entirely. As follows

def random_codes_sampler(datum): ids = datum.id nbc = int(datum.canceled[0]) if nbc >= len(ids): return ids return random.sample(ids, k=nbc) px['C'] = px.apply( random_codes_sampler, axis = 1 )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM