简体   繁体   中英

Fill one Dataframe Column from specific value in list of another column

My dataframe has a column pairs that contains a key-pair list . Each key is unique in the list. eg:

df = pd.DataFrame({
        'id':  ['1', '2', '3'],
        'abc':None,
        'pairs': [ ['abc/123', 'foo/345', 'xyz/789'],  ['abc/456', 'foo/111', 'xyz/789'],  ['xxx/222', 'foo/555', 'xyz/333'] ]
      })

Dataframe is :

  id | abc  | pairs
  ------------------------------------
  1  |None  | [abc/123, foo/345, xyz/789]
  2  |None  | [abc/456, foo/111, xyz/789]
  3  |None  | [xxx/222, foo/555, xyz/333]

The column abc is filled with the value in column pairs if an element (idx=0) split by \ has the value (key) =='abc'.

Expected df :

  id | abc  | pairs
  ------------------------------------
  1  |123   | [abc/123, foo/345, xyz/789]
  2  |456   | [abc/456, foo/111, xyz/789]
  3  |None  | [xxx/222, foo/555, xyz/333]

I look for something like:

df.loc[df['pairs'].map(lambda x: 'abc' in (l.split('/')[0] for l in x)), 'abc'] = 'FOUND'

my problem is to replace the FOUND by the correct value the l.split('/')[0]

You can use .str repeatedly:

df['abc'] = df['pairs'].str[0].str.split('/').loc[lambda x: x.str[0] == 'abc'].str[1]

Output:

>>> df
  id  abc                        pairs
0  1  123  [abc/123, foo/345, xyz/789]
1  2  456  [abc/456, foo/111, xyz/789]
2  3  NaN  [xxx/222, foo/555, xyz/333]

More readable alternative:

x = df['pairs'].str[0].str.split('/')
df.loc[x.str[0] == 'abc', 'abc'] = x.str[1]

Use str.get as much as you like ;)

s = df['pairs'].str.get(0).str.split('/')
df['abc'] = np.where(s.str.get(0) == 'abc', s.str.get(1), None)

Try, you don't need apply nor lambda functions:

a = df['pairs'].str[0].str
df['abc'] = a.split('/').str[1].where(a.startswith('abc'))

Output:

  id  abc                        pairs
0  1  123  [abc/123, foo/345, xyz/789]
1  2  456  [abc/456, foo/111, xyz/789]
2  3  NaN  [xxx/222, foo/555, xyz/333]

Note: str[0] is equal to using str.get(0).

"Elements in the split lists can be accessed using get or [] notation:"

Try this

# data
df = pd.DataFrame({
        'id':  ['1', '2', '3'],
        'abc':None,
        'pairs': [ ['abc/123', 'foo/345', 'xyz/789'],  ['abc/456', 'foo/111', 'xyz/789'],  ['xxx/222', 'foo/555', 'xyz/333'] ]
      })
# construct a dict in loop and get value of abc key
df['abc'] = df['pairs'].apply(lambda x: dict(e.split('/') for e in x).get('abc'))
df

Upon reading the question again, it seems you're only interested in abc key if it's the first element in the lists, so instead of reading each list, just index the first element and split

df['abc'] = df['pairs'].apply(lambda x: dict([x[0].split('/')]).get('abc'))

在此处输入图像描述

" You can use .str repeatedly " -> Yes, but… it is quite slow !

In this context, it is much better to use a list comprehension:

df['abc'] = [x[1] if (x:=l[0].split('/'))[0].startswith('abc') else float('nan')
            for l in df['pairs']]

Rule of thumb: if you need 3 str or more, better try the list comprehension!

One picture is better than thousand words: test of the performance (all current answers) from 3 to almost 1M rows:

在此处输入图像描述

bonus: matching first "abc" on any position (not only 1st)
df['abc'] = [next((x.split('/')[1] for x in l if x.startswith('abc')), None)
             for l in df['pairs']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM