从另一列列表中的特定值填充一个数据框列

Question

My dataframe has a column pairs that contains a key-pair list .我的数据框有一个包含 key-pair list的列pairs 。 Each key is unique in the list.每个键在列表中都是唯一的。 eg:例如：

df = pd.DataFrame({
        'id':  ['1', '2', '3'],
        'abc':None,
        'pairs': [ ['abc/123', 'foo/345', 'xyz/789'],  ['abc/456', 'foo/111', 'xyz/789'],  ['xxx/222', 'foo/555', 'xyz/333'] ]
      })

Dataframe is :数据框是：

  id | abc  | pairs
  ------------------------------------
  1  |None  | [abc/123, foo/345, xyz/789]
  2  |None  | [abc/456, foo/111, xyz/789]
  3  |None  | [xxx/222, foo/555, xyz/333]

The column abc is filled with the value in column pairs if an element (idx=0) split by \ has the value (key) =='abc'.如果由\分割的元素 (idx=0) 具有值 (key) =='abc'，则列abc将填充列pairs中的值。

Expected df :预期df ：

  id | abc  | pairs
  ------------------------------------
  1  |123   | [abc/123, foo/345, xyz/789]
  2  |456   | [abc/456, foo/111, xyz/789]
  3  |None  | [xxx/222, foo/555, xyz/333]

I look for something like:我寻找类似的东西：

df.loc[df['pairs'].map(lambda x: 'abc' in (l.split('/')[0] for l in x)), 'abc'] = 'FOUND'

my problem is to replace the FOUND by the correct value the l.split('/')[0]我的问题是用正确的值替换FOUND l.split('/')[0]

Answer 1

You can use .str repeatedly:您可以重复使用.str ：

df['abc'] = df['pairs'].str[0].str.split('/').loc[lambda x: x.str[0] == 'abc'].str[1]

Output:输出：

>>> df
  id  abc                        pairs
0  1  123  [abc/123, foo/345, xyz/789]
1  2  456  [abc/456, foo/111, xyz/789]
2  3  NaN  [xxx/222, foo/555, xyz/333]

More readable alternative:更具可读性的替代方案：

x = df['pairs'].str[0].str.split('/')
df.loc[x.str[0] == 'abc', 'abc'] = x.str[1]

Answer 2

Use str.get as much as you like ;)尽可能多地使用str.get ;)

s = df['pairs'].str.get(0).str.split('/')
df['abc'] = np.where(s.str.get(0) == 'abc', s.str.get(1), None)

Answer 3

Try, you don't need apply nor lambda functions:试试看，你不需要apply也不需要 lambda 函数：

a = df['pairs'].str[0].str
df['abc'] = a.split('/').str[1].where(a.startswith('abc'))

Output:输出：

  id  abc                        pairs
0  1  123  [abc/123, foo/345, xyz/789]
1  2  456  [abc/456, foo/111, xyz/789]
2  3  NaN  [xxx/222, foo/555, xyz/333]

Note: str[0] is equal to using str.get(0).注意：str[0] 等于使用 str.get(0)。

"Elements in the split lists can be accessed using get or [] notation:" “可以使用 get 或 [] 表示法访问拆分列表中的元素：”

Answer 4

Try this尝试这个

# data
df = pd.DataFrame({
        'id':  ['1', '2', '3'],
        'abc':None,
        'pairs': [ ['abc/123', 'foo/345', 'xyz/789'],  ['abc/456', 'foo/111', 'xyz/789'],  ['xxx/222', 'foo/555', 'xyz/333'] ]
      })
# construct a dict in loop and get value of abc key
df['abc'] = df['pairs'].apply(lambda x: dict(e.split('/') for e in x).get('abc'))
df

Upon reading the question again, it seems you're only interested in abc key if it's the first element in the lists, so instead of reading each list, just index the first element and split再次阅读问题后，您似乎只对abc键感兴趣，如果它是列表中的第一个元素，所以不要读取每个列表，只需索引第一个元素并拆分

df['abc'] = df['pairs'].apply(lambda x: dict([x[0].split('/')]).get('abc'))

Answer 5

" You can use .str repeatedly " -> Yes, but… it is quite slow ! “您可以反复使用 .str ” -> 是的，但是……它很慢！

In this context, it is much better to use a list comprehension:在这种情况下，最好使用列表推导：

df['abc'] = [x[1] if (x:=l[0].split('/'))[0].startswith('abc') else float('nan')
            for l in df['pairs']]

Rule of thumb: if you need 3 str or more, better try the list comprehension!经验法则：如果您需要str或更多，最好尝试列表理解！

One picture is better than thousand words: test of the performance (all current answers) from 3 to almost 1M rows:一张图胜过千字：从 3 行到近 1M 行的性能测试（所有当前答案）：

bonus: matching first "abc" on any position (not only 1st)奖金：在任何位置匹配第一个“abc”（不仅是第一个）

df['abc'] = [next((x.split('/')[1] for x in l if x.startswith('abc')), None)
             for l in df['pairs']]

从另一列列表中的特定值填充一个数据框列

问题描述

5 个解决方案

解决方案1
3

解决方案2
2 2022-06-05 01:13:31

解决方案3
2 2022-06-05 01:21:52

解决方案4
1 已采纳 2022-06-05 01:15:49

解决方案5
0 2022-06-05 05:14:42

bonus: matching first "abc" on any position (not only 1st)奖金：在任何位置匹配第一个“abc”（不仅是第一个）

从另一列列表中的特定值填充一个数据框列

问题描述

5 个解决方案

解决方案1 3

解决方案2 2 2022-06-05 01:13:31

解决方案3 2 2022-06-05 01:21:52

解决方案4 1 已采纳 2022-06-05 01:15:49

解决方案5 0 2022-06-05 05:14:42

bonus: matching first "abc" on any position (not only 1st)奖金：在任何位置匹配第一个“abc”（不仅是第一个）

解决方案1
3

解决方案2
2 2022-06-05 01:13:31

解决方案3
2 2022-06-05 01:21:52

解决方案4
1 已采纳 2022-06-05 01:15:49

解决方案5
0 2022-06-05 05:14:42