簡體   English   中英

Pandas - lambda - 列表中的值和來自另一列的對應值,其中列表中的值

[英]Pandas - lambda - values in list and corresponding value from another column where values in list

考慮下面的 dataframe:

   Name    identifierOne              identifierTwo
0  Name1   ['12032', '444', '555']    ['aaa', 'bbb', 'ccc']
1  Name2   ['666', '51206', '777']    ['ddd', 'eee', 'fff']
2  Name3   ['111', '222', '333']      ['ggg', 'hhh', 'iii']

我可以獲得“identifierOne”具有“120”的條目的行:

print(df[df['identifierOne'].apply(lambda x: '120' in str(x))][['Name', 'identifierOne', 'identifierTwo']])

這將返回:

   Name    identifierOne              identifierTwo
0  Name1   ['12032', '444', '555']    ['aaa', 'bbb', 'ccc']
1  Name2   ['666', '51206', '777']    ['ddd', 'eee', 'fff']

我怎樣才能得到 a) 列表中具有 '120' 的項目和 b) 它是來自 'identifierTwo' 的對應值? 預期 Output:

   Name    identifierOne    identifierTwo
0  Name1   ['12032']        ['aaa']
1  Name2   ['51206']        ['eee']

或者只是字符串:

   Name    identifierOne    identifierTwo
0  Name1   '12032'          'aaa'
1  Name2   '51206'          'eee'

使用pandas.Series.explode

>>> df

    Name      identifierOne    identifierTwo
0  Name1  [12032, 444, 555]  [aaa, bbb, ccc]
1  Name2  [666, 51206, 777]  [ddd, eee, fff]
2  Name3    [111, 222, 333]  [ggg, hhh, iii]

>>> s1 = df['identifierOne'].explode()
>>> s2 = df['identifierTwo'].explode()
>>> cond = s1.str.contains('120')

>>> df.assign(identifierOne=s1[cond], identifierTwo=s2[cond]).dropna()
    Name identifierOne identifierTwo
0  Name1         12032           aaa
1  Name2         51206           eee

筆記:

如果最初的identifier列是liststr表示,則使用ast.literal_eval

>>> from ast import literal_eval

>>> df[['identifierOne', 'identifierTwo']] = (
        df.filter(like='identifier').applymap(literal_eval)
    )

您可以嘗試轉換為列表,然后使用explodeconcatdf.query我們可以在下面執行:


首先將列表的字符串表示形式轉換為實際列表(如果輸入已經是列表,請忽略此步驟

import ast
df[['identifierOne', 'identifierTwo']] = (df[['identifierOne', 'identifierTwo']]
                                         .applymap(ast.literal_eval))

分解列並連接它們,最后使用df.query過濾必要的行,然后加入“名稱”列。

cols = ['identifierOne','identifierTwo']
out = (pd.concat([df[col].explode() for col in cols],axis=1,keys=cols)
      .query("identifierOne.str.contains('120')",engine='python').join(df[['Name']]))

或方法 2 - 使用可調用對象:

cols = ['identifierOne','identifierTwo']
out = (pd.concat([df[col].explode() for col in cols],axis=1,keys=cols)
       .join(df[['Name']]).loc[lambda x: x['identifierOne'].str.contains('120')])

print(out)

  identifierOne identifierTwo   Name
0         12032           aaa  Name1
1         51206           eee  Name2

這是我的整個思考過程:

In [314]: df = pd.DataFrame(dict(Name='Name1 Name2 Name3'.split(), id1=[['12032', '444', '555'], ['666', '51206', '777'], ['111', '222', '333']], id2=[['aaa', 'bbb', 'ccc'], ['ddd', 'eee', 'fff'], ['ggg', 'hhh', 'iii']]))                                                 

In [315]: df['id1e'] = df.id1.apply(lambda L:list(enumerate(L)))                                                                                                                                                                                                              

In [316]: df['id2e'] = df.id2.apply(lambda L:list(enumerate(L)))                                                                                                                                                                                                              

In [317]: df                                                                                                                                                                                                                                                                  
Out[317]: 
    Name                id1              id2                              id1e                            id2e
0  Name1  [12032, 444, 555]  [aaa, bbb, ccc]  [(0, 12032), (1, 444), (2, 555)]  [(0, aaa), (1, bbb), (2, ccc)]
1  Name2  [666, 51206, 777]  [ddd, eee, fff]  [(0, 666), (1, 51206), (2, 777)]  [(0, ddd), (1, eee), (2, fff)]
2  Name3    [111, 222, 333]  [ggg, hhh, iii]    [(0, 111), (1, 222), (2, 333)]  [(0, ggg), (1, hhh), (2, iii)]

In [318]: df.drop('id1 id2'.split(), axis=1, inplace=True)                                                                                                                                                                                                                    

In [319]: df                                                                                                                                                                                                                                                                  
Out[319]: 
    Name                              id1e                            id2e
0  Name1  [(0, 12032), (1, 444), (2, 555)]  [(0, aaa), (1, bbb), (2, ccc)]
1  Name2  [(0, 666), (1, 51206), (2, 777)]  [(0, ddd), (1, eee), (2, fff)]
2  Name3    [(0, 111), (1, 222), (2, 333)]  [(0, ggg), (1, hhh), (2, iii)]

In [320]: df.explode('id1e')                                                                                                                                                                                                                                                  
Out[320]: 
    Name        id1e                            id2e
0  Name1  (0, 12032)  [(0, aaa), (1, bbb), (2, ccc)]
0  Name1    (1, 444)  [(0, aaa), (1, bbb), (2, ccc)]
0  Name1    (2, 555)  [(0, aaa), (1, bbb), (2, ccc)]
1  Name2    (0, 666)  [(0, ddd), (1, eee), (2, fff)]
1  Name2  (1, 51206)  [(0, ddd), (1, eee), (2, fff)]
1  Name2    (2, 777)  [(0, ddd), (1, eee), (2, fff)]
2  Name3    (0, 111)  [(0, ggg), (1, hhh), (2, iii)]
2  Name3    (1, 222)  [(0, ggg), (1, hhh), (2, iii)]
2  Name3    (2, 333)  [(0, ggg), (1, hhh), (2, iii)]

In [321]: df = df.explode('id1e')                                                                                                                                                                                                                                             

In [322]: df = df.explode('id2e')                                                                                                                                                                                                                                             

In [323]: df                                                                                                                                                                                                                                                                  
Out[323]: 
    Name        id1e      id2e
0  Name1  (0, 12032)  (0, aaa)
0  Name1  (0, 12032)  (1, bbb)
0  Name1  (0, 12032)  (2, ccc)
0  Name1    (1, 444)  (0, aaa)
0  Name1    (1, 444)  (1, bbb)
0  Name1    (1, 444)  (2, ccc)
0  Name1    (2, 555)  (0, aaa)
0  Name1    (2, 555)  (1, bbb)
0  Name1    (2, 555)  (2, ccc)
1  Name2    (0, 666)  (0, ddd)
1  Name2    (0, 666)  (1, eee)
1  Name2    (0, 666)  (2, fff)
1  Name2  (1, 51206)  (0, ddd)
1  Name2  (1, 51206)  (1, eee)
1  Name2  (1, 51206)  (2, fff)
1  Name2    (2, 777)  (0, ddd)
1  Name2    (2, 777)  (1, eee)
1  Name2    (2, 777)  (2, fff)
2  Name3    (0, 111)  (0, ggg)
2  Name3    (0, 111)  (1, hhh)
2  Name3    (0, 111)  (2, iii)
2  Name3    (1, 222)  (0, ggg)
2  Name3    (1, 222)  (1, hhh)
2  Name3    (1, 222)  (2, iii)
2  Name3    (2, 333)  (0, ggg)
2  Name3    (2, 333)  (1, hhh)
2  Name3    (2, 333)  (2, iii)

In [324]: df['id1i'] = df.id1e.apply(lambda t:t[0])                                                                                                                                                                                                                           

In [325]: df                                                                                                                                                                                                                                                                  
Out[325]: 
    Name        id1e      id2e  id1i
0  Name1  (0, 12032)  (0, aaa)     0
0  Name1  (0, 12032)  (1, bbb)     0
0  Name1  (0, 12032)  (2, ccc)     0
0  Name1    (1, 444)  (0, aaa)     1
0  Name1    (1, 444)  (1, bbb)     1
0  Name1    (1, 444)  (2, ccc)     1
0  Name1    (2, 555)  (0, aaa)     2
0  Name1    (2, 555)  (1, bbb)     2
0  Name1    (2, 555)  (2, ccc)     2
1  Name2    (0, 666)  (0, ddd)     0
1  Name2    (0, 666)  (1, eee)     0
1  Name2    (0, 666)  (2, fff)     0
1  Name2  (1, 51206)  (0, ddd)     1
1  Name2  (1, 51206)  (1, eee)     1
1  Name2  (1, 51206)  (2, fff)     1
1  Name2    (2, 777)  (0, ddd)     2
1  Name2    (2, 777)  (1, eee)     2
1  Name2    (2, 777)  (2, fff)     2
2  Name3    (0, 111)  (0, ggg)     0
2  Name3    (0, 111)  (1, hhh)     0
2  Name3    (0, 111)  (2, iii)     0
2  Name3    (1, 222)  (0, ggg)     1
2  Name3    (1, 222)  (1, hhh)     1
2  Name3    (1, 222)  (2, iii)     1
2  Name3    (2, 333)  (0, ggg)     2
2  Name3    (2, 333)  (1, hhh)     2
2  Name3    (2, 333)  (2, iii)     2

In [326]: df['id2i'] = df.id2e.apply(lambda t:t[0])                                                                                                                                                                                                                           

In [327]: df                                                                                                                                                                                                                                                                  
Out[327]: 
    Name        id1e      id2e  id1i  id2i
0  Name1  (0, 12032)  (0, aaa)     0     0
0  Name1  (0, 12032)  (1, bbb)     0     1
0  Name1  (0, 12032)  (2, ccc)     0     2
0  Name1    (1, 444)  (0, aaa)     1     0
0  Name1    (1, 444)  (1, bbb)     1     1
0  Name1    (1, 444)  (2, ccc)     1     2
0  Name1    (2, 555)  (0, aaa)     2     0
0  Name1    (2, 555)  (1, bbb)     2     1
0  Name1    (2, 555)  (2, ccc)     2     2
1  Name2    (0, 666)  (0, ddd)     0     0
1  Name2    (0, 666)  (1, eee)     0     1
1  Name2    (0, 666)  (2, fff)     0     2
1  Name2  (1, 51206)  (0, ddd)     1     0
1  Name2  (1, 51206)  (1, eee)     1     1
1  Name2  (1, 51206)  (2, fff)     1     2
1  Name2    (2, 777)  (0, ddd)     2     0
1  Name2    (2, 777)  (1, eee)     2     1
1  Name2    (2, 777)  (2, fff)     2     2
2  Name3    (0, 111)  (0, ggg)     0     0
2  Name3    (0, 111)  (1, hhh)     0     1
2  Name3    (0, 111)  (2, iii)     0     2
2  Name3    (1, 222)  (0, ggg)     1     0
2  Name3    (1, 222)  (1, hhh)     1     1
2  Name3    (1, 222)  (2, iii)     1     2
2  Name3    (2, 333)  (0, ggg)     2     0
2  Name3    (2, 333)  (1, hhh)     2     1
2  Name3    (2, 333)  (2, iii)     2     2

In [328]: df['id1'] = df.id1e.apply(lambda t: t[1])                                                                                                                                                                                                                           

In [329]: df['id2'] = df.id2e.apply(lambda t: t[1])                                                                                                                                                                                                                           

In [330]: df                                                                                                                                                                                                                                                                  
Out[330]: 
    Name        id1e      id2e  id1i  id2i    id1  id2
0  Name1  (0, 12032)  (0, aaa)     0     0  12032  aaa
0  Name1  (0, 12032)  (1, bbb)     0     1  12032  bbb
0  Name1  (0, 12032)  (2, ccc)     0     2  12032  ccc
0  Name1    (1, 444)  (0, aaa)     1     0    444  aaa
0  Name1    (1, 444)  (1, bbb)     1     1    444  bbb
0  Name1    (1, 444)  (2, ccc)     1     2    444  ccc
0  Name1    (2, 555)  (0, aaa)     2     0    555  aaa
0  Name1    (2, 555)  (1, bbb)     2     1    555  bbb
0  Name1    (2, 555)  (2, ccc)     2     2    555  ccc
1  Name2    (0, 666)  (0, ddd)     0     0    666  ddd
1  Name2    (0, 666)  (1, eee)     0     1    666  eee
1  Name2    (0, 666)  (2, fff)     0     2    666  fff
1  Name2  (1, 51206)  (0, ddd)     1     0  51206  ddd
1  Name2  (1, 51206)  (1, eee)     1     1  51206  eee
1  Name2  (1, 51206)  (2, fff)     1     2  51206  fff
1  Name2    (2, 777)  (0, ddd)     2     0    777  ddd
1  Name2    (2, 777)  (1, eee)     2     1    777  eee
1  Name2    (2, 777)  (2, fff)     2     2    777  fff
2  Name3    (0, 111)  (0, ggg)     0     0    111  ggg
2  Name3    (0, 111)  (1, hhh)     0     1    111  hhh
2  Name3    (0, 111)  (2, iii)     0     2    111  iii
2  Name3    (1, 222)  (0, ggg)     1     0    222  ggg
2  Name3    (1, 222)  (1, hhh)     1     1    222  hhh
2  Name3    (1, 222)  (2, iii)     1     2    222  iii
2  Name3    (2, 333)  (0, ggg)     2     0    333  ggg
2  Name3    (2, 333)  (1, hhh)     2     1    333  hhh
2  Name3    (2, 333)  (2, iii)     2     2    333  iii

In [331]: df.drop('id1e id2e'.split(), axis=1, inplace=True)                                                                                                                                                                                                                  

In [332]: df                                                                                                                                                                                                                                                                  
Out[332]: 
    Name  id1i  id2i    id1  id2
0  Name1     0     0  12032  aaa
0  Name1     0     1  12032  bbb
0  Name1     0     2  12032  ccc
0  Name1     1     0    444  aaa
0  Name1     1     1    444  bbb
0  Name1     1     2    444  ccc
0  Name1     2     0    555  aaa
0  Name1     2     1    555  bbb
0  Name1     2     2    555  ccc
1  Name2     0     0    666  ddd
1  Name2     0     1    666  eee
1  Name2     0     2    666  fff
1  Name2     1     0  51206  ddd
1  Name2     1     1  51206  eee
1  Name2     1     2  51206  fff
1  Name2     2     0    777  ddd
1  Name2     2     1    777  eee
1  Name2     2     2    777  fff
2  Name3     0     0    111  ggg
2  Name3     0     1    111  hhh
2  Name3     0     2    111  iii
2  Name3     1     0    222  ggg
2  Name3     1     1    222  hhh
2  Name3     1     2    222  iii
2  Name3     2     0    333  ggg
2  Name3     2     1    333  hhh
2  Name3     2     2    333  iii

In [333]: df[df.id1.apply(lambda x: '120' in str(x))]                                                                                                                                                                                                                         
Out[333]: 
    Name  id1i  id2i    id1  id2
0  Name1     0     0  12032  aaa
0  Name1     0     1  12032  bbb
0  Name1     0     2  12032  ccc
1  Name2     1     0  51206  ddd
1  Name2     1     1  51206  eee
1  Name2     1     2  51206  fff

In [334]: df = df[df.id1.apply(lambda x: '120' in str(x))]                                                                                                                                                                                                                    

In [335]: df[df.id1i == df.id2i]                                                                                                                                                                                                                                              
Out[335]: 
    Name  id1i  id2i    id1  id2
0  Name1     0     0  12032  aaa
1  Name2     1     1  51206  eee

In [336]: df[df.id1i == df.id2i]['id1 id2'.split()]                                                                                                                                                                                                                           
Out[336]: 
     id1  id2
0  12032  aaa
1  51206  eee

這是一個apply function 可用於迭代數據並寫入名為output

# construct an output df
output = pd.DataFrame(index=df.index, columns=df.columns)
output['Name'] = df['Name']

def findvalue(df, value):
    # check the words which contain the value
    inlist = [value in word for word in df['identifierOne']]
    try:
        # this will throw error if True is not found
        index = inlist.index(True)

        # but if there is a True, write the correct things to `output`
        one = df['identifierOne'][index]
        two = df['identifierTwo'][index]
        output.loc[df.name, 'identifierOne'] = one
        output.loc[df.name, 'identifierTwo'] = two

    except ValueError:
        return

有了這個,您可以像這樣apply function:

lookfor = '120'
df.apply(findvalue, axis=1, value=lookfor)

結果(即output ):

    Name identifierOne identifierTwo
0  Name1         12032           aaa
1  Name2         51206           eee
2  Name3           NaN           NaN

# note that these are strings, all dypes == object

這是非常重的循環,所以我想這不是最快的答案。 但我認為邏輯更基本一些。

一個快速說明是inlist.index(True)操作只返回列表中第一個True的索引。 如果您預計每個單元格中會多次出現該值,那么您可以執行以下findvalue

def findvalue(df, value):
    # check the words which contain the value
    inlist = [value in word for word in df['identifierOne']]

    one = []
    two = []

    # now we explicitly check all of the booleans in `inlist`
    for i, boolean in enumerate(inlist):
        if boolean:
            one.append(df['identifierOne'][i])
            two.append(df['identifierTwo'][i])

    # only write to `output` if there is something to write
    if one:
        output.loc[df.name, 'identifierOne'] = one
        output.loc[df.name, 'identifierTwo'] = two

對於同一個示例,結果現在位於(字符串的)列表中:

    Name identifierOne identifierTwo
0  Name1       [12032]         [aaa]
1  Name2       [51206]         [eee]
2  Name3           NaN           NaN

您可以使用 apply 和不使用導入執行以下操作:

import pandas as pd
import numpy as np
df=pd.DataFrame([['Name1' , ['12032', '444', '555'], ['aaa', 'bbb', 'ccc']],
                ['Name2', ['666', '51206', '777'], ['ddd', 'eee', 'fff']],
                ['Name3', ['111', '222', '333'], ['ggg', 'hhh', 'iii']]],columns=['Name','identifierOne','identifierTwo'])

# this loops the items inside the series in the apply function
idx = df['identifierOne'].apply(lambda x: ''.join([str(x.index(y)) if '120' in str(y) else '' for y in x]))

rowindex = df[idx != ''].index
listindex = idx.iloc[rowindex].astype(int)
listindex.name = 'listindex'
subset = df[df.index.isin(rowindex)]
subset.index = subset.index.astype(int)
concat = pd.merge(subset, listindex, left_index=True, right_index=True)
concat['identifierOne'] = concat.apply(lambda x: x['identifierOne'][x['listindex']], axis=1)
concat['identifierTwo'] = concat.apply(lambda x: x['identifierTwo'][x['listindex']], axis=1)

給出結果:

concat[['Name','identifierOne','identifierTwo']]

Name    identifierOne   identifierTwo
0   Name1   12032   aaa
1   Name2   51206   eee

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM