Pandas - lambda - 列表中的值和来自另一列的对应值，其中列表中的值

Question

Consider the below dataframe:考虑下面的 dataframe：

   Name    identifierOne              identifierTwo
0  Name1   ['12032', '444', '555']    ['aaa', 'bbb', 'ccc']
1  Name2   ['666', '51206', '777']    ['ddd', 'eee', 'fff']
2  Name3   ['111', '222', '333']      ['ggg', 'hhh', 'iii']

I can get the row of the entry where 'identifierOne' has a '120' with:我可以获得“identifierOne”具有“120”的条目的行：

print(df[df['identifierOne'].apply(lambda x: '120' in str(x))][['Name', 'identifierOne', 'identifierTwo']])

which will return:这将返回：

   Name    identifierOne              identifierTwo
0  Name1   ['12032', '444', '555']    ['aaa', 'bbb', 'ccc']
1  Name2   ['666', '51206', '777']    ['ddd', 'eee', 'fff']

How can I get a) just the item in the list that has '120' and b) it's corresponding value from 'identifierTwo'?我怎样才能得到 a) 列表中具有 '120' 的项目和 b) 它是来自 'identifierTwo' 的对应值？ Expected Output:预期 Output：

   Name    identifierOne    identifierTwo
0  Name1   ['12032']        ['aaa']
1  Name2   ['51206']        ['eee']

or just the string:或者只是字符串：

   Name    identifierOne    identifierTwo
0  Name1   '12032'          'aaa'
1  Name2   '51206'          'eee'

Answer 1

Use pandas.Series.explode :使用pandas.Series.explode ：

>>> df

    Name      identifierOne    identifierTwo
0  Name1  [12032, 444, 555]  [aaa, bbb, ccc]
1  Name2  [666, 51206, 777]  [ddd, eee, fff]
2  Name3    [111, 222, 333]  [ggg, hhh, iii]

>>> s1 = df['identifierOne'].explode()
>>> s2 = df['identifierTwo'].explode()
>>> cond = s1.str.contains('120')

>>> df.assign(identifierOne=s1[cond], identifierTwo=s2[cond]).dropna()
    Name identifierOne identifierTwo
0  Name1         12032           aaa
1  Name2         51206           eee

NOTE:笔记：

If initially identifier columns are str representation of list , then use ast.literal_eval :如果最初的identifier列是list的str表示，则使用ast.literal_eval ：

>>> from ast import literal_eval

>>> df[['identifierOne', 'identifierTwo']] = (
        df.filter(like='identifier').applymap(literal_eval)
    )

Answer 2

You could try converting to list then using explode , concat and df.query we can do below:您可以尝试转换为列表，然后使用explode 、 concat和df.query我们可以在下面执行：

First convert your string representation of a list to an actual list ( ignore this step if the input is already a list )首先将列表的字符串表示形式转换为实际列表（如果输入已经是列表，请忽略此步骤）

import ast
df[['identifierOne', 'identifierTwo']] = (df[['identifierOne', 'identifierTwo']]
                                         .applymap(ast.literal_eval))

Explode the columns and concat them and finally using df.query , filter the necessary rows and then join the 'Name' column.分解列并连接它们，最后使用df.query过滤必要的行，然后加入“名称”列。

cols = ['identifierOne','identifierTwo']
out = (pd.concat([df[col].explode() for col in cols],axis=1,keys=cols)
      .query("identifierOne.str.contains('120')",engine='python').join(df[['Name']]))

Or Method 2 - Using a callable:或方法 2 - 使用可调用对象：

cols = ['identifierOne','identifierTwo']
out = (pd.concat([df[col].explode() for col in cols],axis=1,keys=cols)
       .join(df[['Name']]).loc[lambda x: x['identifierOne'].str.contains('120')])

print(out)

  identifierOne identifierTwo   Name
0         12032           aaa  Name1
1         51206           eee  Name2

Answer 3

Here's my entire thought process:这是我的整个思考过程：

In [314]: df = pd.DataFrame(dict(Name='Name1 Name2 Name3'.split(), id1=[['12032', '444', '555'], ['666', '51206', '777'], ['111', '222', '333']], id2=[['aaa', 'bbb', 'ccc'], ['ddd', 'eee', 'fff'], ['ggg', 'hhh', 'iii']]))                                                 

In [315]: df['id1e'] = df.id1.apply(lambda L:list(enumerate(L)))                                                                                                                                                                                                              

In [316]: df['id2e'] = df.id2.apply(lambda L:list(enumerate(L)))                                                                                                                                                                                                              

In [317]: df                                                                                                                                                                                                                                                                  
Out[317]: 
    Name                id1              id2                              id1e                            id2e
0  Name1  [12032, 444, 555]  [aaa, bbb, ccc]  [(0, 12032), (1, 444), (2, 555)]  [(0, aaa), (1, bbb), (2, ccc)]
1  Name2  [666, 51206, 777]  [ddd, eee, fff]  [(0, 666), (1, 51206), (2, 777)]  [(0, ddd), (1, eee), (2, fff)]
2  Name3    [111, 222, 333]  [ggg, hhh, iii]    [(0, 111), (1, 222), (2, 333)]  [(0, ggg), (1, hhh), (2, iii)]

In [318]: df.drop('id1 id2'.split(), axis=1, inplace=True)                                                                                                                                                                                                                    

In [319]: df                                                                                                                                                                                                                                                                  
Out[319]: 
    Name                              id1e                            id2e
0  Name1  [(0, 12032), (1, 444), (2, 555)]  [(0, aaa), (1, bbb), (2, ccc)]
1  Name2  [(0, 666), (1, 51206), (2, 777)]  [(0, ddd), (1, eee), (2, fff)]
2  Name3    [(0, 111), (1, 222), (2, 333)]  [(0, ggg), (1, hhh), (2, iii)]

In [320]: df.explode('id1e')                                                                                                                                                                                                                                                  
Out[320]: 
    Name        id1e                            id2e
0  Name1  (0, 12032)  [(0, aaa), (1, bbb), (2, ccc)]
0  Name1    (1, 444)  [(0, aaa), (1, bbb), (2, ccc)]
0  Name1    (2, 555)  [(0, aaa), (1, bbb), (2, ccc)]
1  Name2    (0, 666)  [(0, ddd), (1, eee), (2, fff)]
1  Name2  (1, 51206)  [(0, ddd), (1, eee), (2, fff)]
1  Name2    (2, 777)  [(0, ddd), (1, eee), (2, fff)]
2  Name3    (0, 111)  [(0, ggg), (1, hhh), (2, iii)]
2  Name3    (1, 222)  [(0, ggg), (1, hhh), (2, iii)]
2  Name3    (2, 333)  [(0, ggg), (1, hhh), (2, iii)]

In [321]: df = df.explode('id1e')                                                                                                                                                                                                                                             

In [322]: df = df.explode('id2e')                                                                                                                                                                                                                                             

In [323]: df                                                                                                                                                                                                                                                                  
Out[323]: 
    Name        id1e      id2e
0  Name1  (0, 12032)  (0, aaa)
0  Name1  (0, 12032)  (1, bbb)
0  Name1  (0, 12032)  (2, ccc)
0  Name1    (1, 444)  (0, aaa)
0  Name1    (1, 444)  (1, bbb)
0  Name1    (1, 444)  (2, ccc)
0  Name1    (2, 555)  (0, aaa)
0  Name1    (2, 555)  (1, bbb)
0  Name1    (2, 555)  (2, ccc)
1  Name2    (0, 666)  (0, ddd)
1  Name2    (0, 666)  (1, eee)
1  Name2    (0, 666)  (2, fff)
1  Name2  (1, 51206)  (0, ddd)
1  Name2  (1, 51206)  (1, eee)
1  Name2  (1, 51206)  (2, fff)
1  Name2    (2, 777)  (0, ddd)
1  Name2    (2, 777)  (1, eee)
1  Name2    (2, 777)  (2, fff)
2  Name3    (0, 111)  (0, ggg)
2  Name3    (0, 111)  (1, hhh)
2  Name3    (0, 111)  (2, iii)
2  Name3    (1, 222)  (0, ggg)
2  Name3    (1, 222)  (1, hhh)
2  Name3    (1, 222)  (2, iii)
2  Name3    (2, 333)  (0, ggg)
2  Name3    (2, 333)  (1, hhh)
2  Name3    (2, 333)  (2, iii)

In [324]: df['id1i'] = df.id1e.apply(lambda t:t[0])                                                                                                                                                                                                                           

In [325]: df                                                                                                                                                                                                                                                                  
Out[325]: 
    Name        id1e      id2e  id1i
0  Name1  (0, 12032)  (0, aaa)     0
0  Name1  (0, 12032)  (1, bbb)     0
0  Name1  (0, 12032)  (2, ccc)     0
0  Name1    (1, 444)  (0, aaa)     1
0  Name1    (1, 444)  (1, bbb)     1
0  Name1    (1, 444)  (2, ccc)     1
0  Name1    (2, 555)  (0, aaa)     2
0  Name1    (2, 555)  (1, bbb)     2
0  Name1    (2, 555)  (2, ccc)     2
1  Name2    (0, 666)  (0, ddd)     0
1  Name2    (0, 666)  (1, eee)     0
1  Name2    (0, 666)  (2, fff)     0
1  Name2  (1, 51206)  (0, ddd)     1
1  Name2  (1, 51206)  (1, eee)     1
1  Name2  (1, 51206)  (2, fff)     1
1  Name2    (2, 777)  (0, ddd)     2
1  Name2    (2, 777)  (1, eee)     2
1  Name2    (2, 777)  (2, fff)     2
2  Name3    (0, 111)  (0, ggg)     0
2  Name3    (0, 111)  (1, hhh)     0
2  Name3    (0, 111)  (2, iii)     0
2  Name3    (1, 222)  (0, ggg)     1
2  Name3    (1, 222)  (1, hhh)     1
2  Name3    (1, 222)  (2, iii)     1
2  Name3    (2, 333)  (0, ggg)     2
2  Name3    (2, 333)  (1, hhh)     2
2  Name3    (2, 333)  (2, iii)     2

In [326]: df['id2i'] = df.id2e.apply(lambda t:t[0])                                                                                                                                                                                                                           

In [327]: df                                                                                                                                                                                                                                                                  
Out[327]: 
    Name        id1e      id2e  id1i  id2i
0  Name1  (0, 12032)  (0, aaa)     0     0
0  Name1  (0, 12032)  (1, bbb)     0     1
0  Name1  (0, 12032)  (2, ccc)     0     2
0  Name1    (1, 444)  (0, aaa)     1     0
0  Name1    (1, 444)  (1, bbb)     1     1
0  Name1    (1, 444)  (2, ccc)     1     2
0  Name1    (2, 555)  (0, aaa)     2     0
0  Name1    (2, 555)  (1, bbb)     2     1
0  Name1    (2, 555)  (2, ccc)     2     2
1  Name2    (0, 666)  (0, ddd)     0     0
1  Name2    (0, 666)  (1, eee)     0     1
1  Name2    (0, 666)  (2, fff)     0     2
1  Name2  (1, 51206)  (0, ddd)     1     0
1  Name2  (1, 51206)  (1, eee)     1     1
1  Name2  (1, 51206)  (2, fff)     1     2
1  Name2    (2, 777)  (0, ddd)     2     0
1  Name2    (2, 777)  (1, eee)     2     1
1  Name2    (2, 777)  (2, fff)     2     2
2  Name3    (0, 111)  (0, ggg)     0     0
2  Name3    (0, 111)  (1, hhh)     0     1
2  Name3    (0, 111)  (2, iii)     0     2
2  Name3    (1, 222)  (0, ggg)     1     0
2  Name3    (1, 222)  (1, hhh)     1     1
2  Name3    (1, 222)  (2, iii)     1     2
2  Name3    (2, 333)  (0, ggg)     2     0
2  Name3    (2, 333)  (1, hhh)     2     1
2  Name3    (2, 333)  (2, iii)     2     2

In [328]: df['id1'] = df.id1e.apply(lambda t: t[1])                                                                                                                                                                                                                           

In [329]: df['id2'] = df.id2e.apply(lambda t: t[1])                                                                                                                                                                                                                           

In [330]: df                                                                                                                                                                                                                                                                  
Out[330]: 
    Name        id1e      id2e  id1i  id2i    id1  id2
0  Name1  (0, 12032)  (0, aaa)     0     0  12032  aaa
0  Name1  (0, 12032)  (1, bbb)     0     1  12032  bbb
0  Name1  (0, 12032)  (2, ccc)     0     2  12032  ccc
0  Name1    (1, 444)  (0, aaa)     1     0    444  aaa
0  Name1    (1, 444)  (1, bbb)     1     1    444  bbb
0  Name1    (1, 444)  (2, ccc)     1     2    444  ccc
0  Name1    (2, 555)  (0, aaa)     2     0    555  aaa
0  Name1    (2, 555)  (1, bbb)     2     1    555  bbb
0  Name1    (2, 555)  (2, ccc)     2     2    555  ccc
1  Name2    (0, 666)  (0, ddd)     0     0    666  ddd
1  Name2    (0, 666)  (1, eee)     0     1    666  eee
1  Name2    (0, 666)  (2, fff)     0     2    666  fff
1  Name2  (1, 51206)  (0, ddd)     1     0  51206  ddd
1  Name2  (1, 51206)  (1, eee)     1     1  51206  eee
1  Name2  (1, 51206)  (2, fff)     1     2  51206  fff
1  Name2    (2, 777)  (0, ddd)     2     0    777  ddd
1  Name2    (2, 777)  (1, eee)     2     1    777  eee
1  Name2    (2, 777)  (2, fff)     2     2    777  fff
2  Name3    (0, 111)  (0, ggg)     0     0    111  ggg
2  Name3    (0, 111)  (1, hhh)     0     1    111  hhh
2  Name3    (0, 111)  (2, iii)     0     2    111  iii
2  Name3    (1, 222)  (0, ggg)     1     0    222  ggg
2  Name3    (1, 222)  (1, hhh)     1     1    222  hhh
2  Name3    (1, 222)  (2, iii)     1     2    222  iii
2  Name3    (2, 333)  (0, ggg)     2     0    333  ggg
2  Name3    (2, 333)  (1, hhh)     2     1    333  hhh
2  Name3    (2, 333)  (2, iii)     2     2    333  iii

In [331]: df.drop('id1e id2e'.split(), axis=1, inplace=True)                                                                                                                                                                                                                  

In [332]: df                                                                                                                                                                                                                                                                  
Out[332]: 
    Name  id1i  id2i    id1  id2
0  Name1     0     0  12032  aaa
0  Name1     0     1  12032  bbb
0  Name1     0     2  12032  ccc
0  Name1     1     0    444  aaa
0  Name1     1     1    444  bbb
0  Name1     1     2    444  ccc
0  Name1     2     0    555  aaa
0  Name1     2     1    555  bbb
0  Name1     2     2    555  ccc
1  Name2     0     0    666  ddd
1  Name2     0     1    666  eee
1  Name2     0     2    666  fff
1  Name2     1     0  51206  ddd
1  Name2     1     1  51206  eee
1  Name2     1     2  51206  fff
1  Name2     2     0    777  ddd
1  Name2     2     1    777  eee
1  Name2     2     2    777  fff
2  Name3     0     0    111  ggg
2  Name3     0     1    111  hhh
2  Name3     0     2    111  iii
2  Name3     1     0    222  ggg
2  Name3     1     1    222  hhh
2  Name3     1     2    222  iii
2  Name3     2     0    333  ggg
2  Name3     2     1    333  hhh
2  Name3     2     2    333  iii

In [333]: df[df.id1.apply(lambda x: '120' in str(x))]                                                                                                                                                                                                                         
Out[333]: 
    Name  id1i  id2i    id1  id2
0  Name1     0     0  12032  aaa
0  Name1     0     1  12032  bbb
0  Name1     0     2  12032  ccc
1  Name2     1     0  51206  ddd
1  Name2     1     1  51206  eee
1  Name2     1     2  51206  fff

In [334]: df = df[df.id1.apply(lambda x: '120' in str(x))]                                                                                                                                                                                                                    

In [335]: df[df.id1i == df.id2i]                                                                                                                                                                                                                                              
Out[335]: 
    Name  id1i  id2i    id1  id2
0  Name1     0     0  12032  aaa
1  Name2     1     1  51206  eee

In [336]: df[df.id1i == df.id2i]['id1 id2'.split()]                                                                                                                                                                                                                           
Out[336]: 
     id1  id2
0  12032  aaa
1  51206  eee

Answer 4

Here is an apply function which can be used to iterate over your data and write to a new DataFrame called output .这是一个apply function 可用于迭代数据并写入名为output 。

# construct an output df
output = pd.DataFrame(index=df.index, columns=df.columns)
output['Name'] = df['Name']

def findvalue(df, value):
    # check the words which contain the value
    inlist = [value in word for word in df['identifierOne']]
    try:
        # this will throw error if True is not found
        index = inlist.index(True)

        # but if there is a True, write the correct things to `output`
        one = df['identifierOne'][index]
        two = df['identifierTwo'][index]
        output.loc[df.name, 'identifierOne'] = one
        output.loc[df.name, 'identifierTwo'] = two

    except ValueError:
        return

With this, you can apply the function like so:有了这个，您可以像这样apply function：

lookfor = '120'
df.apply(findvalue, axis=1, value=lookfor)

Result (ie, output ):结果（即output ）：

    Name identifierOne identifierTwo
0  Name1         12032           aaa
1  Name2         51206           eee
2  Name3           NaN           NaN

# note that these are strings, all dypes == object

This is very loop heavy, so I imagine is not the fastest answer.这是非常重的循环，所以我想这不是最快的答案。 But I think the logic is a little more basic.但我认为逻辑更基本一些。

One quick note is that the inlist.index(True) operation is only returning the index of the first True in the list.一个快速说明是inlist.index(True)操作只返回列表中第一个True的索引。 If you anticipate having multiple occurrences of the value within each cell, then you could do the following findvalue :如果您预计每个单元格中会多次出现该值，那么您可以执行以下findvalue ：

def findvalue(df, value):
    # check the words which contain the value
    inlist = [value in word for word in df['identifierOne']]

    one = []
    two = []

    # now we explicitly check all of the booleans in `inlist`
    for i, boolean in enumerate(inlist):
        if boolean:
            one.append(df['identifierOne'][i])
            two.append(df['identifierTwo'][i])

    # only write to `output` if there is something to write
    if one:
        output.loc[df.name, 'identifierOne'] = one
        output.loc[df.name, 'identifierTwo'] = two

For the same example, the result is now in lists (of strings):对于同一个示例，结果现在位于（字符串的）列表中：

    Name identifierOne identifierTwo
0  Name1       [12032]         [aaa]
1  Name2       [51206]         [eee]
2  Name3           NaN           NaN

Answer 5

You can do the following with apply and without imports:您可以使用 apply 和不使用导入执行以下操作：

import pandas as pd
import numpy as np
df=pd.DataFrame([['Name1' , ['12032', '444', '555'], ['aaa', 'bbb', 'ccc']],
                ['Name2', ['666', '51206', '777'], ['ddd', 'eee', 'fff']],
                ['Name3', ['111', '222', '333'], ['ggg', 'hhh', 'iii']]],columns=['Name','identifierOne','identifierTwo'])

# this loops the items inside the series in the apply function
idx = df['identifierOne'].apply(lambda x: ''.join([str(x.index(y)) if '120' in str(y) else '' for y in x]))

rowindex = df[idx != ''].index
listindex = idx.iloc[rowindex].astype(int)
listindex.name = 'listindex'
subset = df[df.index.isin(rowindex)]
subset.index = subset.index.astype(int)
concat = pd.merge(subset, listindex, left_index=True, right_index=True)
concat['identifierOne'] = concat.apply(lambda x: x['identifierOne'][x['listindex']], axis=1)
concat['identifierTwo'] = concat.apply(lambda x: x['identifierTwo'][x['listindex']], axis=1)

Giving the result:给出结果：

concat[['Name','identifierOne','identifierTwo']]

Name    identifierOne   identifierTwo
0   Name1   12032   aaa
1   Name2   51206   eee

Pandas - lambda - 列表中的值和来自另一列的对应值，其中列表中的值

问题描述

5 个解决方案

解决方案1
3 2021-02-25 16:23:27

解决方案2
2 已采纳 2021-02-25 16:03:56

解决方案3
2 2021-02-25 16:14:22

解决方案4
2 2021-02-25 16:17:55

解决方案5
1

Pandas - lambda - 列表中的值和来自另一列的对应值，其中列表中的值

问题描述

5 个解决方案

解决方案1 3 2021-02-25 16:23:27

解决方案2 2 已采纳 2021-02-25 16:03:56

解决方案3 2 2021-02-25 16:14:22

解决方案4 2 2021-02-25 16:17:55

解决方案5 1

解决方案1
3 2021-02-25 16:23:27

解决方案2
2 已采纳 2021-02-25 16:03:56

解决方案3
2 2021-02-25 16:14:22

解决方案4
2 2021-02-25 16:17:55

解决方案5
1