[英]Pandas - lambda - values in list and corresponding value from another column where values in list
Consider the below dataframe:考虑下面的 dataframe:
Name identifierOne identifierTwo
0 Name1 ['12032', '444', '555'] ['aaa', 'bbb', 'ccc']
1 Name2 ['666', '51206', '777'] ['ddd', 'eee', 'fff']
2 Name3 ['111', '222', '333'] ['ggg', 'hhh', 'iii']
I can get the row of the entry where 'identifierOne' has a '120' with:我可以获得“identifierOne”具有“120”的条目的行:
print(df[df['identifierOne'].apply(lambda x: '120' in str(x))][['Name', 'identifierOne', 'identifierTwo']])
which will return:这将返回:
Name identifierOne identifierTwo
0 Name1 ['12032', '444', '555'] ['aaa', 'bbb', 'ccc']
1 Name2 ['666', '51206', '777'] ['ddd', 'eee', 'fff']
How can I get a) just the item in the list that has '120' and b) it's corresponding value from 'identifierTwo'?我怎样才能得到 a) 列表中具有 '120' 的项目和 b) 它是来自 'identifierTwo' 的对应值? Expected Output:
预期 Output:
Name identifierOne identifierTwo
0 Name1 ['12032'] ['aaa']
1 Name2 ['51206'] ['eee']
or just the string:或者只是字符串:
Name identifierOne identifierTwo
0 Name1 '12032' 'aaa'
1 Name2 '51206' 'eee'
Use pandas.Series.explode
:使用
pandas.Series.explode
:
>>> df
Name identifierOne identifierTwo
0 Name1 [12032, 444, 555] [aaa, bbb, ccc]
1 Name2 [666, 51206, 777] [ddd, eee, fff]
2 Name3 [111, 222, 333] [ggg, hhh, iii]
>>> s1 = df['identifierOne'].explode()
>>> s2 = df['identifierTwo'].explode()
>>> cond = s1.str.contains('120')
>>> df.assign(identifierOne=s1[cond], identifierTwo=s2[cond]).dropna()
Name identifierOne identifierTwo
0 Name1 12032 aaa
1 Name2 51206 eee
NOTE:笔记:
If initially identifier
columns are str
representation of list
, then use ast.literal_eval
:如果最初的
identifier
列是list
的str
表示,则使用ast.literal_eval
:
>>> from ast import literal_eval
>>> df[['identifierOne', 'identifierTwo']] = (
df.filter(like='identifier').applymap(literal_eval)
)
You could try converting to list then using explode
, concat
and df.query
we can do below:您可以尝试转换为列表,然后使用
explode
、 concat
和df.query
我们可以在下面执行:
First convert your string representation of a list to an actual list ( ignore this step if the input is already a list )首先将列表的字符串表示形式转换为实际列表(如果输入已经是列表,请忽略此步骤)
import ast
df[['identifierOne', 'identifierTwo']] = (df[['identifierOne', 'identifierTwo']]
.applymap(ast.literal_eval))
Explode the columns and concat them and finally using df.query
, filter the necessary rows and then join the 'Name' column.分解列并连接它们,最后使用
df.query
过滤必要的行,然后加入“名称”列。
cols = ['identifierOne','identifierTwo']
out = (pd.concat([df[col].explode() for col in cols],axis=1,keys=cols)
.query("identifierOne.str.contains('120')",engine='python').join(df[['Name']]))
Or Method 2 - Using a callable:或方法 2 - 使用可调用对象:
cols = ['identifierOne','identifierTwo']
out = (pd.concat([df[col].explode() for col in cols],axis=1,keys=cols)
.join(df[['Name']]).loc[lambda x: x['identifierOne'].str.contains('120')])
print(out)
identifierOne identifierTwo Name
0 12032 aaa Name1
1 51206 eee Name2
Here's my entire thought process:这是我的整个思考过程:
In [314]: df = pd.DataFrame(dict(Name='Name1 Name2 Name3'.split(), id1=[['12032', '444', '555'], ['666', '51206', '777'], ['111', '222', '333']], id2=[['aaa', 'bbb', 'ccc'], ['ddd', 'eee', 'fff'], ['ggg', 'hhh', 'iii']]))
In [315]: df['id1e'] = df.id1.apply(lambda L:list(enumerate(L)))
In [316]: df['id2e'] = df.id2.apply(lambda L:list(enumerate(L)))
In [317]: df
Out[317]:
Name id1 id2 id1e id2e
0 Name1 [12032, 444, 555] [aaa, bbb, ccc] [(0, 12032), (1, 444), (2, 555)] [(0, aaa), (1, bbb), (2, ccc)]
1 Name2 [666, 51206, 777] [ddd, eee, fff] [(0, 666), (1, 51206), (2, 777)] [(0, ddd), (1, eee), (2, fff)]
2 Name3 [111, 222, 333] [ggg, hhh, iii] [(0, 111), (1, 222), (2, 333)] [(0, ggg), (1, hhh), (2, iii)]
In [318]: df.drop('id1 id2'.split(), axis=1, inplace=True)
In [319]: df
Out[319]:
Name id1e id2e
0 Name1 [(0, 12032), (1, 444), (2, 555)] [(0, aaa), (1, bbb), (2, ccc)]
1 Name2 [(0, 666), (1, 51206), (2, 777)] [(0, ddd), (1, eee), (2, fff)]
2 Name3 [(0, 111), (1, 222), (2, 333)] [(0, ggg), (1, hhh), (2, iii)]
In [320]: df.explode('id1e')
Out[320]:
Name id1e id2e
0 Name1 (0, 12032) [(0, aaa), (1, bbb), (2, ccc)]
0 Name1 (1, 444) [(0, aaa), (1, bbb), (2, ccc)]
0 Name1 (2, 555) [(0, aaa), (1, bbb), (2, ccc)]
1 Name2 (0, 666) [(0, ddd), (1, eee), (2, fff)]
1 Name2 (1, 51206) [(0, ddd), (1, eee), (2, fff)]
1 Name2 (2, 777) [(0, ddd), (1, eee), (2, fff)]
2 Name3 (0, 111) [(0, ggg), (1, hhh), (2, iii)]
2 Name3 (1, 222) [(0, ggg), (1, hhh), (2, iii)]
2 Name3 (2, 333) [(0, ggg), (1, hhh), (2, iii)]
In [321]: df = df.explode('id1e')
In [322]: df = df.explode('id2e')
In [323]: df
Out[323]:
Name id1e id2e
0 Name1 (0, 12032) (0, aaa)
0 Name1 (0, 12032) (1, bbb)
0 Name1 (0, 12032) (2, ccc)
0 Name1 (1, 444) (0, aaa)
0 Name1 (1, 444) (1, bbb)
0 Name1 (1, 444) (2, ccc)
0 Name1 (2, 555) (0, aaa)
0 Name1 (2, 555) (1, bbb)
0 Name1 (2, 555) (2, ccc)
1 Name2 (0, 666) (0, ddd)
1 Name2 (0, 666) (1, eee)
1 Name2 (0, 666) (2, fff)
1 Name2 (1, 51206) (0, ddd)
1 Name2 (1, 51206) (1, eee)
1 Name2 (1, 51206) (2, fff)
1 Name2 (2, 777) (0, ddd)
1 Name2 (2, 777) (1, eee)
1 Name2 (2, 777) (2, fff)
2 Name3 (0, 111) (0, ggg)
2 Name3 (0, 111) (1, hhh)
2 Name3 (0, 111) (2, iii)
2 Name3 (1, 222) (0, ggg)
2 Name3 (1, 222) (1, hhh)
2 Name3 (1, 222) (2, iii)
2 Name3 (2, 333) (0, ggg)
2 Name3 (2, 333) (1, hhh)
2 Name3 (2, 333) (2, iii)
In [324]: df['id1i'] = df.id1e.apply(lambda t:t[0])
In [325]: df
Out[325]:
Name id1e id2e id1i
0 Name1 (0, 12032) (0, aaa) 0
0 Name1 (0, 12032) (1, bbb) 0
0 Name1 (0, 12032) (2, ccc) 0
0 Name1 (1, 444) (0, aaa) 1
0 Name1 (1, 444) (1, bbb) 1
0 Name1 (1, 444) (2, ccc) 1
0 Name1 (2, 555) (0, aaa) 2
0 Name1 (2, 555) (1, bbb) 2
0 Name1 (2, 555) (2, ccc) 2
1 Name2 (0, 666) (0, ddd) 0
1 Name2 (0, 666) (1, eee) 0
1 Name2 (0, 666) (2, fff) 0
1 Name2 (1, 51206) (0, ddd) 1
1 Name2 (1, 51206) (1, eee) 1
1 Name2 (1, 51206) (2, fff) 1
1 Name2 (2, 777) (0, ddd) 2
1 Name2 (2, 777) (1, eee) 2
1 Name2 (2, 777) (2, fff) 2
2 Name3 (0, 111) (0, ggg) 0
2 Name3 (0, 111) (1, hhh) 0
2 Name3 (0, 111) (2, iii) 0
2 Name3 (1, 222) (0, ggg) 1
2 Name3 (1, 222) (1, hhh) 1
2 Name3 (1, 222) (2, iii) 1
2 Name3 (2, 333) (0, ggg) 2
2 Name3 (2, 333) (1, hhh) 2
2 Name3 (2, 333) (2, iii) 2
In [326]: df['id2i'] = df.id2e.apply(lambda t:t[0])
In [327]: df
Out[327]:
Name id1e id2e id1i id2i
0 Name1 (0, 12032) (0, aaa) 0 0
0 Name1 (0, 12032) (1, bbb) 0 1
0 Name1 (0, 12032) (2, ccc) 0 2
0 Name1 (1, 444) (0, aaa) 1 0
0 Name1 (1, 444) (1, bbb) 1 1
0 Name1 (1, 444) (2, ccc) 1 2
0 Name1 (2, 555) (0, aaa) 2 0
0 Name1 (2, 555) (1, bbb) 2 1
0 Name1 (2, 555) (2, ccc) 2 2
1 Name2 (0, 666) (0, ddd) 0 0
1 Name2 (0, 666) (1, eee) 0 1
1 Name2 (0, 666) (2, fff) 0 2
1 Name2 (1, 51206) (0, ddd) 1 0
1 Name2 (1, 51206) (1, eee) 1 1
1 Name2 (1, 51206) (2, fff) 1 2
1 Name2 (2, 777) (0, ddd) 2 0
1 Name2 (2, 777) (1, eee) 2 1
1 Name2 (2, 777) (2, fff) 2 2
2 Name3 (0, 111) (0, ggg) 0 0
2 Name3 (0, 111) (1, hhh) 0 1
2 Name3 (0, 111) (2, iii) 0 2
2 Name3 (1, 222) (0, ggg) 1 0
2 Name3 (1, 222) (1, hhh) 1 1
2 Name3 (1, 222) (2, iii) 1 2
2 Name3 (2, 333) (0, ggg) 2 0
2 Name3 (2, 333) (1, hhh) 2 1
2 Name3 (2, 333) (2, iii) 2 2
In [328]: df['id1'] = df.id1e.apply(lambda t: t[1])
In [329]: df['id2'] = df.id2e.apply(lambda t: t[1])
In [330]: df
Out[330]:
Name id1e id2e id1i id2i id1 id2
0 Name1 (0, 12032) (0, aaa) 0 0 12032 aaa
0 Name1 (0, 12032) (1, bbb) 0 1 12032 bbb
0 Name1 (0, 12032) (2, ccc) 0 2 12032 ccc
0 Name1 (1, 444) (0, aaa) 1 0 444 aaa
0 Name1 (1, 444) (1, bbb) 1 1 444 bbb
0 Name1 (1, 444) (2, ccc) 1 2 444 ccc
0 Name1 (2, 555) (0, aaa) 2 0 555 aaa
0 Name1 (2, 555) (1, bbb) 2 1 555 bbb
0 Name1 (2, 555) (2, ccc) 2 2 555 ccc
1 Name2 (0, 666) (0, ddd) 0 0 666 ddd
1 Name2 (0, 666) (1, eee) 0 1 666 eee
1 Name2 (0, 666) (2, fff) 0 2 666 fff
1 Name2 (1, 51206) (0, ddd) 1 0 51206 ddd
1 Name2 (1, 51206) (1, eee) 1 1 51206 eee
1 Name2 (1, 51206) (2, fff) 1 2 51206 fff
1 Name2 (2, 777) (0, ddd) 2 0 777 ddd
1 Name2 (2, 777) (1, eee) 2 1 777 eee
1 Name2 (2, 777) (2, fff) 2 2 777 fff
2 Name3 (0, 111) (0, ggg) 0 0 111 ggg
2 Name3 (0, 111) (1, hhh) 0 1 111 hhh
2 Name3 (0, 111) (2, iii) 0 2 111 iii
2 Name3 (1, 222) (0, ggg) 1 0 222 ggg
2 Name3 (1, 222) (1, hhh) 1 1 222 hhh
2 Name3 (1, 222) (2, iii) 1 2 222 iii
2 Name3 (2, 333) (0, ggg) 2 0 333 ggg
2 Name3 (2, 333) (1, hhh) 2 1 333 hhh
2 Name3 (2, 333) (2, iii) 2 2 333 iii
In [331]: df.drop('id1e id2e'.split(), axis=1, inplace=True)
In [332]: df
Out[332]:
Name id1i id2i id1 id2
0 Name1 0 0 12032 aaa
0 Name1 0 1 12032 bbb
0 Name1 0 2 12032 ccc
0 Name1 1 0 444 aaa
0 Name1 1 1 444 bbb
0 Name1 1 2 444 ccc
0 Name1 2 0 555 aaa
0 Name1 2 1 555 bbb
0 Name1 2 2 555 ccc
1 Name2 0 0 666 ddd
1 Name2 0 1 666 eee
1 Name2 0 2 666 fff
1 Name2 1 0 51206 ddd
1 Name2 1 1 51206 eee
1 Name2 1 2 51206 fff
1 Name2 2 0 777 ddd
1 Name2 2 1 777 eee
1 Name2 2 2 777 fff
2 Name3 0 0 111 ggg
2 Name3 0 1 111 hhh
2 Name3 0 2 111 iii
2 Name3 1 0 222 ggg
2 Name3 1 1 222 hhh
2 Name3 1 2 222 iii
2 Name3 2 0 333 ggg
2 Name3 2 1 333 hhh
2 Name3 2 2 333 iii
In [333]: df[df.id1.apply(lambda x: '120' in str(x))]
Out[333]:
Name id1i id2i id1 id2
0 Name1 0 0 12032 aaa
0 Name1 0 1 12032 bbb
0 Name1 0 2 12032 ccc
1 Name2 1 0 51206 ddd
1 Name2 1 1 51206 eee
1 Name2 1 2 51206 fff
In [334]: df = df[df.id1.apply(lambda x: '120' in str(x))]
In [335]: df[df.id1i == df.id2i]
Out[335]:
Name id1i id2i id1 id2
0 Name1 0 0 12032 aaa
1 Name2 1 1 51206 eee
In [336]: df[df.id1i == df.id2i]['id1 id2'.split()]
Out[336]:
id1 id2
0 12032 aaa
1 51206 eee
Here is an apply
function which can be used to iterate over your data and write to a new DataFrame called output
.这是一个
apply
function 可用于迭代数据并写入名为output
。
# construct an output df
output = pd.DataFrame(index=df.index, columns=df.columns)
output['Name'] = df['Name']
def findvalue(df, value):
# check the words which contain the value
inlist = [value in word for word in df['identifierOne']]
try:
# this will throw error if True is not found
index = inlist.index(True)
# but if there is a True, write the correct things to `output`
one = df['identifierOne'][index]
two = df['identifierTwo'][index]
output.loc[df.name, 'identifierOne'] = one
output.loc[df.name, 'identifierTwo'] = two
except ValueError:
return
With this, you can apply
the function like so:有了这个,您可以像这样
apply
function:
lookfor = '120'
df.apply(findvalue, axis=1, value=lookfor)
Result (ie, output
):结果(即
output
):
Name identifierOne identifierTwo
0 Name1 12032 aaa
1 Name2 51206 eee
2 Name3 NaN NaN
# note that these are strings, all dypes == object
This is very loop heavy, so I imagine is not the fastest answer.这是非常重的循环,所以我想这不是最快的答案。 But I think the logic is a little more basic.
但我认为逻辑更基本一些。
One quick note is that the inlist.index(True)
operation is only returning the index of the first True
in the list.一个快速说明是
inlist.index(True)
操作只返回列表中第一个True
的索引。 If you anticipate having multiple occurrences of the value within each cell, then you could do the following findvalue
:如果您预计每个单元格中会多次出现该值,那么您可以执行以下
findvalue
:
def findvalue(df, value):
# check the words which contain the value
inlist = [value in word for word in df['identifierOne']]
one = []
two = []
# now we explicitly check all of the booleans in `inlist`
for i, boolean in enumerate(inlist):
if boolean:
one.append(df['identifierOne'][i])
two.append(df['identifierTwo'][i])
# only write to `output` if there is something to write
if one:
output.loc[df.name, 'identifierOne'] = one
output.loc[df.name, 'identifierTwo'] = two
For the same example, the result is now in lists (of strings):对于同一个示例,结果现在位于(字符串的)列表中:
Name identifierOne identifierTwo
0 Name1 [12032] [aaa]
1 Name2 [51206] [eee]
2 Name3 NaN NaN
You can do the following with apply and without imports:您可以使用 apply 和不使用导入执行以下操作:
import pandas as pd
import numpy as np
df=pd.DataFrame([['Name1' , ['12032', '444', '555'], ['aaa', 'bbb', 'ccc']],
['Name2', ['666', '51206', '777'], ['ddd', 'eee', 'fff']],
['Name3', ['111', '222', '333'], ['ggg', 'hhh', 'iii']]],columns=['Name','identifierOne','identifierTwo'])
# this loops the items inside the series in the apply function
idx = df['identifierOne'].apply(lambda x: ''.join([str(x.index(y)) if '120' in str(y) else '' for y in x]))
rowindex = df[idx != ''].index
listindex = idx.iloc[rowindex].astype(int)
listindex.name = 'listindex'
subset = df[df.index.isin(rowindex)]
subset.index = subset.index.astype(int)
concat = pd.merge(subset, listindex, left_index=True, right_index=True)
concat['identifierOne'] = concat.apply(lambda x: x['identifierOne'][x['listindex']], axis=1)
concat['identifierTwo'] = concat.apply(lambda x: x['identifierTwo'][x['listindex']], axis=1)
Giving the result:给出结果:
concat[['Name','identifierOne','identifierTwo']]
Name identifierOne identifierTwo
0 Name1 12032 aaa
1 Name2 51206 eee
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.