简体   繁体   English

Pandas - lambda - 列表中的值和来自另一列的对应值,其中列表中的值

[英]Pandas - lambda - values in list and corresponding value from another column where values in list

Consider the below dataframe:考虑下面的 dataframe:

   Name    identifierOne              identifierTwo
0  Name1   ['12032', '444', '555']    ['aaa', 'bbb', 'ccc']
1  Name2   ['666', '51206', '777']    ['ddd', 'eee', 'fff']
2  Name3   ['111', '222', '333']      ['ggg', 'hhh', 'iii']

I can get the row of the entry where 'identifierOne' has a '120' with:我可以获得“identifierOne”具有“120”的条目的行:

print(df[df['identifierOne'].apply(lambda x: '120' in str(x))][['Name', 'identifierOne', 'identifierTwo']])

which will return:这将返回:

   Name    identifierOne              identifierTwo
0  Name1   ['12032', '444', '555']    ['aaa', 'bbb', 'ccc']
1  Name2   ['666', '51206', '777']    ['ddd', 'eee', 'fff']

How can I get a) just the item in the list that has '120' and b) it's corresponding value from 'identifierTwo'?我怎样才能得到 a) 列表中具有 '120' 的项目和 b) 它是来自 'identifierTwo' 的对应值? Expected Output:预期 Output:

   Name    identifierOne    identifierTwo
0  Name1   ['12032']        ['aaa']
1  Name2   ['51206']        ['eee']

or just the string:或者只是字符串:

   Name    identifierOne    identifierTwo
0  Name1   '12032'          'aaa'
1  Name2   '51206'          'eee'

Use pandas.Series.explode :使用pandas.Series.explode

>>> df

    Name      identifierOne    identifierTwo
0  Name1  [12032, 444, 555]  [aaa, bbb, ccc]
1  Name2  [666, 51206, 777]  [ddd, eee, fff]
2  Name3    [111, 222, 333]  [ggg, hhh, iii]

>>> s1 = df['identifierOne'].explode()
>>> s2 = df['identifierTwo'].explode()
>>> cond = s1.str.contains('120')

>>> df.assign(identifierOne=s1[cond], identifierTwo=s2[cond]).dropna()
    Name identifierOne identifierTwo
0  Name1         12032           aaa
1  Name2         51206           eee

NOTE:笔记:

If initially identifier columns are str representation of list , then use ast.literal_eval :如果最初的identifier列是liststr表示,则使用ast.literal_eval

>>> from ast import literal_eval

>>> df[['identifierOne', 'identifierTwo']] = (
        df.filter(like='identifier').applymap(literal_eval)
    )

You could try converting to list then using explode , concat and df.query we can do below:您可以尝试转换为列表,然后使用explodeconcatdf.query我们可以在下面执行:


First convert your string representation of a list to an actual list ( ignore this step if the input is already a list )首先将列表的字符串表示形式转换为实际列表(如果输入已经是列表,请忽略此步骤

import ast
df[['identifierOne', 'identifierTwo']] = (df[['identifierOne', 'identifierTwo']]
                                         .applymap(ast.literal_eval))

Explode the columns and concat them and finally using df.query , filter the necessary rows and then join the 'Name' column.分解列并连接它们,最后使用df.query过滤必要的行,然后加入“名称”列。

cols = ['identifierOne','identifierTwo']
out = (pd.concat([df[col].explode() for col in cols],axis=1,keys=cols)
      .query("identifierOne.str.contains('120')",engine='python').join(df[['Name']]))

Or Method 2 - Using a callable:或方法 2 - 使用可调用对象:

cols = ['identifierOne','identifierTwo']
out = (pd.concat([df[col].explode() for col in cols],axis=1,keys=cols)
       .join(df[['Name']]).loc[lambda x: x['identifierOne'].str.contains('120')])

print(out)

  identifierOne identifierTwo   Name
0         12032           aaa  Name1
1         51206           eee  Name2

Here's my entire thought process:这是我的整个思考过程:

In [314]: df = pd.DataFrame(dict(Name='Name1 Name2 Name3'.split(), id1=[['12032', '444', '555'], ['666', '51206', '777'], ['111', '222', '333']], id2=[['aaa', 'bbb', 'ccc'], ['ddd', 'eee', 'fff'], ['ggg', 'hhh', 'iii']]))                                                 

In [315]: df['id1e'] = df.id1.apply(lambda L:list(enumerate(L)))                                                                                                                                                                                                              

In [316]: df['id2e'] = df.id2.apply(lambda L:list(enumerate(L)))                                                                                                                                                                                                              

In [317]: df                                                                                                                                                                                                                                                                  
Out[317]: 
    Name                id1              id2                              id1e                            id2e
0  Name1  [12032, 444, 555]  [aaa, bbb, ccc]  [(0, 12032), (1, 444), (2, 555)]  [(0, aaa), (1, bbb), (2, ccc)]
1  Name2  [666, 51206, 777]  [ddd, eee, fff]  [(0, 666), (1, 51206), (2, 777)]  [(0, ddd), (1, eee), (2, fff)]
2  Name3    [111, 222, 333]  [ggg, hhh, iii]    [(0, 111), (1, 222), (2, 333)]  [(0, ggg), (1, hhh), (2, iii)]

In [318]: df.drop('id1 id2'.split(), axis=1, inplace=True)                                                                                                                                                                                                                    

In [319]: df                                                                                                                                                                                                                                                                  
Out[319]: 
    Name                              id1e                            id2e
0  Name1  [(0, 12032), (1, 444), (2, 555)]  [(0, aaa), (1, bbb), (2, ccc)]
1  Name2  [(0, 666), (1, 51206), (2, 777)]  [(0, ddd), (1, eee), (2, fff)]
2  Name3    [(0, 111), (1, 222), (2, 333)]  [(0, ggg), (1, hhh), (2, iii)]

In [320]: df.explode('id1e')                                                                                                                                                                                                                                                  
Out[320]: 
    Name        id1e                            id2e
0  Name1  (0, 12032)  [(0, aaa), (1, bbb), (2, ccc)]
0  Name1    (1, 444)  [(0, aaa), (1, bbb), (2, ccc)]
0  Name1    (2, 555)  [(0, aaa), (1, bbb), (2, ccc)]
1  Name2    (0, 666)  [(0, ddd), (1, eee), (2, fff)]
1  Name2  (1, 51206)  [(0, ddd), (1, eee), (2, fff)]
1  Name2    (2, 777)  [(0, ddd), (1, eee), (2, fff)]
2  Name3    (0, 111)  [(0, ggg), (1, hhh), (2, iii)]
2  Name3    (1, 222)  [(0, ggg), (1, hhh), (2, iii)]
2  Name3    (2, 333)  [(0, ggg), (1, hhh), (2, iii)]

In [321]: df = df.explode('id1e')                                                                                                                                                                                                                                             

In [322]: df = df.explode('id2e')                                                                                                                                                                                                                                             

In [323]: df                                                                                                                                                                                                                                                                  
Out[323]: 
    Name        id1e      id2e
0  Name1  (0, 12032)  (0, aaa)
0  Name1  (0, 12032)  (1, bbb)
0  Name1  (0, 12032)  (2, ccc)
0  Name1    (1, 444)  (0, aaa)
0  Name1    (1, 444)  (1, bbb)
0  Name1    (1, 444)  (2, ccc)
0  Name1    (2, 555)  (0, aaa)
0  Name1    (2, 555)  (1, bbb)
0  Name1    (2, 555)  (2, ccc)
1  Name2    (0, 666)  (0, ddd)
1  Name2    (0, 666)  (1, eee)
1  Name2    (0, 666)  (2, fff)
1  Name2  (1, 51206)  (0, ddd)
1  Name2  (1, 51206)  (1, eee)
1  Name2  (1, 51206)  (2, fff)
1  Name2    (2, 777)  (0, ddd)
1  Name2    (2, 777)  (1, eee)
1  Name2    (2, 777)  (2, fff)
2  Name3    (0, 111)  (0, ggg)
2  Name3    (0, 111)  (1, hhh)
2  Name3    (0, 111)  (2, iii)
2  Name3    (1, 222)  (0, ggg)
2  Name3    (1, 222)  (1, hhh)
2  Name3    (1, 222)  (2, iii)
2  Name3    (2, 333)  (0, ggg)
2  Name3    (2, 333)  (1, hhh)
2  Name3    (2, 333)  (2, iii)

In [324]: df['id1i'] = df.id1e.apply(lambda t:t[0])                                                                                                                                                                                                                           

In [325]: df                                                                                                                                                                                                                                                                  
Out[325]: 
    Name        id1e      id2e  id1i
0  Name1  (0, 12032)  (0, aaa)     0
0  Name1  (0, 12032)  (1, bbb)     0
0  Name1  (0, 12032)  (2, ccc)     0
0  Name1    (1, 444)  (0, aaa)     1
0  Name1    (1, 444)  (1, bbb)     1
0  Name1    (1, 444)  (2, ccc)     1
0  Name1    (2, 555)  (0, aaa)     2
0  Name1    (2, 555)  (1, bbb)     2
0  Name1    (2, 555)  (2, ccc)     2
1  Name2    (0, 666)  (0, ddd)     0
1  Name2    (0, 666)  (1, eee)     0
1  Name2    (0, 666)  (2, fff)     0
1  Name2  (1, 51206)  (0, ddd)     1
1  Name2  (1, 51206)  (1, eee)     1
1  Name2  (1, 51206)  (2, fff)     1
1  Name2    (2, 777)  (0, ddd)     2
1  Name2    (2, 777)  (1, eee)     2
1  Name2    (2, 777)  (2, fff)     2
2  Name3    (0, 111)  (0, ggg)     0
2  Name3    (0, 111)  (1, hhh)     0
2  Name3    (0, 111)  (2, iii)     0
2  Name3    (1, 222)  (0, ggg)     1
2  Name3    (1, 222)  (1, hhh)     1
2  Name3    (1, 222)  (2, iii)     1
2  Name3    (2, 333)  (0, ggg)     2
2  Name3    (2, 333)  (1, hhh)     2
2  Name3    (2, 333)  (2, iii)     2

In [326]: df['id2i'] = df.id2e.apply(lambda t:t[0])                                                                                                                                                                                                                           

In [327]: df                                                                                                                                                                                                                                                                  
Out[327]: 
    Name        id1e      id2e  id1i  id2i
0  Name1  (0, 12032)  (0, aaa)     0     0
0  Name1  (0, 12032)  (1, bbb)     0     1
0  Name1  (0, 12032)  (2, ccc)     0     2
0  Name1    (1, 444)  (0, aaa)     1     0
0  Name1    (1, 444)  (1, bbb)     1     1
0  Name1    (1, 444)  (2, ccc)     1     2
0  Name1    (2, 555)  (0, aaa)     2     0
0  Name1    (2, 555)  (1, bbb)     2     1
0  Name1    (2, 555)  (2, ccc)     2     2
1  Name2    (0, 666)  (0, ddd)     0     0
1  Name2    (0, 666)  (1, eee)     0     1
1  Name2    (0, 666)  (2, fff)     0     2
1  Name2  (1, 51206)  (0, ddd)     1     0
1  Name2  (1, 51206)  (1, eee)     1     1
1  Name2  (1, 51206)  (2, fff)     1     2
1  Name2    (2, 777)  (0, ddd)     2     0
1  Name2    (2, 777)  (1, eee)     2     1
1  Name2    (2, 777)  (2, fff)     2     2
2  Name3    (0, 111)  (0, ggg)     0     0
2  Name3    (0, 111)  (1, hhh)     0     1
2  Name3    (0, 111)  (2, iii)     0     2
2  Name3    (1, 222)  (0, ggg)     1     0
2  Name3    (1, 222)  (1, hhh)     1     1
2  Name3    (1, 222)  (2, iii)     1     2
2  Name3    (2, 333)  (0, ggg)     2     0
2  Name3    (2, 333)  (1, hhh)     2     1
2  Name3    (2, 333)  (2, iii)     2     2

In [328]: df['id1'] = df.id1e.apply(lambda t: t[1])                                                                                                                                                                                                                           

In [329]: df['id2'] = df.id2e.apply(lambda t: t[1])                                                                                                                                                                                                                           

In [330]: df                                                                                                                                                                                                                                                                  
Out[330]: 
    Name        id1e      id2e  id1i  id2i    id1  id2
0  Name1  (0, 12032)  (0, aaa)     0     0  12032  aaa
0  Name1  (0, 12032)  (1, bbb)     0     1  12032  bbb
0  Name1  (0, 12032)  (2, ccc)     0     2  12032  ccc
0  Name1    (1, 444)  (0, aaa)     1     0    444  aaa
0  Name1    (1, 444)  (1, bbb)     1     1    444  bbb
0  Name1    (1, 444)  (2, ccc)     1     2    444  ccc
0  Name1    (2, 555)  (0, aaa)     2     0    555  aaa
0  Name1    (2, 555)  (1, bbb)     2     1    555  bbb
0  Name1    (2, 555)  (2, ccc)     2     2    555  ccc
1  Name2    (0, 666)  (0, ddd)     0     0    666  ddd
1  Name2    (0, 666)  (1, eee)     0     1    666  eee
1  Name2    (0, 666)  (2, fff)     0     2    666  fff
1  Name2  (1, 51206)  (0, ddd)     1     0  51206  ddd
1  Name2  (1, 51206)  (1, eee)     1     1  51206  eee
1  Name2  (1, 51206)  (2, fff)     1     2  51206  fff
1  Name2    (2, 777)  (0, ddd)     2     0    777  ddd
1  Name2    (2, 777)  (1, eee)     2     1    777  eee
1  Name2    (2, 777)  (2, fff)     2     2    777  fff
2  Name3    (0, 111)  (0, ggg)     0     0    111  ggg
2  Name3    (0, 111)  (1, hhh)     0     1    111  hhh
2  Name3    (0, 111)  (2, iii)     0     2    111  iii
2  Name3    (1, 222)  (0, ggg)     1     0    222  ggg
2  Name3    (1, 222)  (1, hhh)     1     1    222  hhh
2  Name3    (1, 222)  (2, iii)     1     2    222  iii
2  Name3    (2, 333)  (0, ggg)     2     0    333  ggg
2  Name3    (2, 333)  (1, hhh)     2     1    333  hhh
2  Name3    (2, 333)  (2, iii)     2     2    333  iii

In [331]: df.drop('id1e id2e'.split(), axis=1, inplace=True)                                                                                                                                                                                                                  

In [332]: df                                                                                                                                                                                                                                                                  
Out[332]: 
    Name  id1i  id2i    id1  id2
0  Name1     0     0  12032  aaa
0  Name1     0     1  12032  bbb
0  Name1     0     2  12032  ccc
0  Name1     1     0    444  aaa
0  Name1     1     1    444  bbb
0  Name1     1     2    444  ccc
0  Name1     2     0    555  aaa
0  Name1     2     1    555  bbb
0  Name1     2     2    555  ccc
1  Name2     0     0    666  ddd
1  Name2     0     1    666  eee
1  Name2     0     2    666  fff
1  Name2     1     0  51206  ddd
1  Name2     1     1  51206  eee
1  Name2     1     2  51206  fff
1  Name2     2     0    777  ddd
1  Name2     2     1    777  eee
1  Name2     2     2    777  fff
2  Name3     0     0    111  ggg
2  Name3     0     1    111  hhh
2  Name3     0     2    111  iii
2  Name3     1     0    222  ggg
2  Name3     1     1    222  hhh
2  Name3     1     2    222  iii
2  Name3     2     0    333  ggg
2  Name3     2     1    333  hhh
2  Name3     2     2    333  iii

In [333]: df[df.id1.apply(lambda x: '120' in str(x))]                                                                                                                                                                                                                         
Out[333]: 
    Name  id1i  id2i    id1  id2
0  Name1     0     0  12032  aaa
0  Name1     0     1  12032  bbb
0  Name1     0     2  12032  ccc
1  Name2     1     0  51206  ddd
1  Name2     1     1  51206  eee
1  Name2     1     2  51206  fff

In [334]: df = df[df.id1.apply(lambda x: '120' in str(x))]                                                                                                                                                                                                                    

In [335]: df[df.id1i == df.id2i]                                                                                                                                                                                                                                              
Out[335]: 
    Name  id1i  id2i    id1  id2
0  Name1     0     0  12032  aaa
1  Name2     1     1  51206  eee

In [336]: df[df.id1i == df.id2i]['id1 id2'.split()]                                                                                                                                                                                                                           
Out[336]: 
     id1  id2
0  12032  aaa
1  51206  eee

Here is an apply function which can be used to iterate over your data and write to a new DataFrame called output .这是一个apply function 可用于迭代数据并写入名为output

# construct an output df
output = pd.DataFrame(index=df.index, columns=df.columns)
output['Name'] = df['Name']

def findvalue(df, value):
    # check the words which contain the value
    inlist = [value in word for word in df['identifierOne']]
    try:
        # this will throw error if True is not found
        index = inlist.index(True)

        # but if there is a True, write the correct things to `output`
        one = df['identifierOne'][index]
        two = df['identifierTwo'][index]
        output.loc[df.name, 'identifierOne'] = one
        output.loc[df.name, 'identifierTwo'] = two

    except ValueError:
        return

With this, you can apply the function like so:有了这个,您可以像这样apply function:

lookfor = '120'
df.apply(findvalue, axis=1, value=lookfor)

Result (ie, output ):结果(即output ):

    Name identifierOne identifierTwo
0  Name1         12032           aaa
1  Name2         51206           eee
2  Name3           NaN           NaN

# note that these are strings, all dypes == object

This is very loop heavy, so I imagine is not the fastest answer.这是非常重的循环,所以我想这不是最快的答案。 But I think the logic is a little more basic.但我认为逻辑更基本一些。

One quick note is that the inlist.index(True) operation is only returning the index of the first True in the list.一个快速说明是inlist.index(True)操作只返回列表中第一个True的索引。 If you anticipate having multiple occurrences of the value within each cell, then you could do the following findvalue :如果您预计每个单元格中会多次出现该值,那么您可以执行以下findvalue

def findvalue(df, value):
    # check the words which contain the value
    inlist = [value in word for word in df['identifierOne']]

    one = []
    two = []

    # now we explicitly check all of the booleans in `inlist`
    for i, boolean in enumerate(inlist):
        if boolean:
            one.append(df['identifierOne'][i])
            two.append(df['identifierTwo'][i])

    # only write to `output` if there is something to write
    if one:
        output.loc[df.name, 'identifierOne'] = one
        output.loc[df.name, 'identifierTwo'] = two

For the same example, the result is now in lists (of strings):对于同一个示例,结果现在位于(字符串的)列表中:

    Name identifierOne identifierTwo
0  Name1       [12032]         [aaa]
1  Name2       [51206]         [eee]
2  Name3           NaN           NaN

You can do the following with apply and without imports:您可以使用 apply 和不使用导入执行以下操作:

import pandas as pd
import numpy as np
df=pd.DataFrame([['Name1' , ['12032', '444', '555'], ['aaa', 'bbb', 'ccc']],
                ['Name2', ['666', '51206', '777'], ['ddd', 'eee', 'fff']],
                ['Name3', ['111', '222', '333'], ['ggg', 'hhh', 'iii']]],columns=['Name','identifierOne','identifierTwo'])

# this loops the items inside the series in the apply function
idx = df['identifierOne'].apply(lambda x: ''.join([str(x.index(y)) if '120' in str(y) else '' for y in x]))

rowindex = df[idx != ''].index
listindex = idx.iloc[rowindex].astype(int)
listindex.name = 'listindex'
subset = df[df.index.isin(rowindex)]
subset.index = subset.index.astype(int)
concat = pd.merge(subset, listindex, left_index=True, right_index=True)
concat['identifierOne'] = concat.apply(lambda x: x['identifierOne'][x['listindex']], axis=1)
concat['identifierTwo'] = concat.apply(lambda x: x['identifierTwo'][x['listindex']], axis=1)

Giving the result:给出结果:

concat[['Name','identifierOne','identifierTwo']]

Name    identifierOne   identifierTwo
0   Name1   12032   aaa
1   Name2   51206   eee

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从熊猫到字典,第一列中的值将是键,第二列中的相应值都将在列表中 - From pandas to dictionary so that the value in column one will be the key and the corresponding values in column two will all be in a list 将列表的 Pandas 列替换为相应的 dict 值 - Replace a Pandas column of list with corresponding dict values 如何从值在列表中的 pandas 列中提取唯一值 - How to extract unique values from pandas column where values are in list pandas:根据列表和另一列条件替换逗号分隔列中的相应值 - pandas: replace corresponding values in a comma separated column based on a list and another column conditions 使用 Python Pandas 如何将一列列表值(id 编号)到一列列表值(对应于字典列表中的名称) - Using Python Pandas how to map a column of list values (of id numbers) to a new column of list values (corresponding to names from dictionary list) 从一个列表中的相应值中减去另一个列表中的值 - Subtract values in one list from corresponding values in another list 替换与另一列 pandas 中的特定值相对应的列中的空值 - Replace null values in a column corresponding to specific value in another column pandas 至少有 1 个 Null 值的列名列表以及与每列 pandas 对应的 Null 值总数 - List of Column names having at least 1 Null value and total number of null values corresponding to each column pandas Pandas:根据另一个列列表中的值对列列表进行排序 - Pandas: sort column lists based on values from another column list 创建字典,其中键来自列表,值是另一个列表中相应元素的总和 - Create dictionary where keys are from a list and values are the sum of corresponding elements in another list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM