Fill one Dataframe Column from specific value in list of another column

Question

My dataframe has a column pairs that contains a key-pair list . Each key is unique in the list. eg:

df = pd.DataFrame({
        'id':  ['1', '2', '3'],
        'abc':None,
        'pairs': [ ['abc/123', 'foo/345', 'xyz/789'],  ['abc/456', 'foo/111', 'xyz/789'],  ['xxx/222', 'foo/555', 'xyz/333'] ]
      })

Dataframe is :

  id | abc  | pairs
  ------------------------------------
  1  |None  | [abc/123, foo/345, xyz/789]
  2  |None  | [abc/456, foo/111, xyz/789]
  3  |None  | [xxx/222, foo/555, xyz/333]

The column abc is filled with the value in column pairs if an element (idx=0) split by \ has the value (key) =='abc'.

Expected df :

  id | abc  | pairs
  ------------------------------------
  1  |123   | [abc/123, foo/345, xyz/789]
  2  |456   | [abc/456, foo/111, xyz/789]
  3  |None  | [xxx/222, foo/555, xyz/333]

I look for something like:

df.loc[df['pairs'].map(lambda x: 'abc' in (l.split('/')[0] for l in x)), 'abc'] = 'FOUND'

my problem is to replace the FOUND by the correct value the l.split('/')[0]

Answer 1

You can use .str repeatedly:

df['abc'] = df['pairs'].str[0].str.split('/').loc[lambda x: x.str[0] == 'abc'].str[1]

Output:

>>> df
  id  abc                        pairs
0  1  123  [abc/123, foo/345, xyz/789]
1  2  456  [abc/456, foo/111, xyz/789]
2  3  NaN  [xxx/222, foo/555, xyz/333]

More readable alternative:

x = df['pairs'].str[0].str.split('/')
df.loc[x.str[0] == 'abc', 'abc'] = x.str[1]

Answer 2

Use str.get as much as you like ;)

s = df['pairs'].str.get(0).str.split('/')
df['abc'] = np.where(s.str.get(0) == 'abc', s.str.get(1), None)

Answer 3

Try, you don't need apply nor lambda functions:

a = df['pairs'].str[0].str
df['abc'] = a.split('/').str[1].where(a.startswith('abc'))

Output:

  id  abc                        pairs
0  1  123  [abc/123, foo/345, xyz/789]
1  2  456  [abc/456, foo/111, xyz/789]
2  3  NaN  [xxx/222, foo/555, xyz/333]

Note: str[0] is equal to using str.get(0).

"Elements in the split lists can be accessed using get or [] notation:"

Answer 4

Try this

# data
df = pd.DataFrame({
        'id':  ['1', '2', '3'],
        'abc':None,
        'pairs': [ ['abc/123', 'foo/345', 'xyz/789'],  ['abc/456', 'foo/111', 'xyz/789'],  ['xxx/222', 'foo/555', 'xyz/333'] ]
      })
# construct a dict in loop and get value of abc key
df['abc'] = df['pairs'].apply(lambda x: dict(e.split('/') for e in x).get('abc'))
df

Upon reading the question again, it seems you're only interested in abc key if it's the first element in the lists, so instead of reading each list, just index the first element and split

df['abc'] = df['pairs'].apply(lambda x: dict([x[0].split('/')]).get('abc'))

Answer 5

" You can use .str repeatedly " -> Yes, but… it is quite slow !

In this context, it is much better to use a list comprehension:

df['abc'] = [x[1] if (x:=l[0].split('/'))[0].startswith('abc') else float('nan')
            for l in df['pairs']]

Rule of thumb: if you need 3 str or more, better try the list comprehension!

One picture is better than thousand words: test of the performance (all current answers) from 3 to almost 1M rows:

bonus: matching first "abc" on any position (not only 1st)

df['abc'] = [next((x.split('/')[1] for x in l if x.startswith('abc')), None)
             for l in df['pairs']]

Fill one Dataframe Column from specific value in list of another column

Question

5 answers

solution1
3

solution2
2 2022-06-05 01:13:31

solution3
2 2022-06-05 01:21:52

solution4
1 ACCPTED 2022-06-05 01:15:49

solution5
0 2022-06-05 05:14:42

bonus: matching first "abc" on any position (not only 1st)

Fill one Dataframe Column from specific value in list of another column

Question

5 answers

solution1 3

solution2 2 2022-06-05 01:13:31

solution3 2 2022-06-05 01:21:52

solution4 1 ACCPTED 2022-06-05 01:15:49

solution5 0 2022-06-05 05:14:42

bonus: matching first "abc" on any position (not only 1st)

solution1
3

solution2
2 2022-06-05 01:13:31

solution3
2 2022-06-05 01:21:52

solution4
1 ACCPTED 2022-06-05 01:15:49

solution5
0 2022-06-05 05:14:42