同一数据帧 Pandas Python 中的交叉引用列

Question

** I have edited the sample df for two columns to be tuples instead of integers to illustrate the problem I have with the solution once I change the data from integers to tuples ** ** 我已将两列的示例 df 编辑为元组而不是整数，以说明我将数据从整数更改为元组后解决方案中遇到的问题 **

I am trying to create a new column in Pandas whose value will depend value of a specific column being present on a different row on a separate columns, and where a match is found, use the value of a third columns.我正在尝试在 Pandas 中创建一个新列，其值将取决于出现在单独列的不同行上的特定列的值，并且在找到匹配项的地方，使用第三列的值。

To illustrate, see the below example.为了说明，请参见下面的示例。

I am using a lambda function in df.apply() to do the following: in the first row, it will filter for rows where the value of column 'two' equals the value of column 'zero', and where it does, it takes the value of column 'one' and copies it into new column 'three'.我在 df.apply() 中使用 lambda 函数来执行以下操作：在第一行中，它将过滤列“二”的值等于“零”列的值的行，并且在它的位置获取“一”列的值并将其复制到新的“三”列中。

df = pd.DataFrame([[(0,9),(1,9),(2,9),(3,9),(4,9)],['a','b','c','d','e'],[(2,9),(3,9),(4,9),(5,9),(6,9)]]).transpose() df = pd.DataFrame([[(0,9),(1,9),(2,9),(3,9),(4,9)],['a','b','c ','d','e'],[(2,9),(3,9),(4,9),(5,9),(6,9)]]).transpose()

df.columns = ['zero','one','two']

df['three] = df.apply(lambda x : df[df['zero'] == x['two']].loc[:,'one'], axis=1)

Note, column 'two' and column 'zero' are unique, so the filter result will one ever have one row.请注意，“二”列和“零”列是唯一的，因此筛选结果将永远只有一行。

In theory, the result of column 'three' should be : 'c', 'd', 'e', 'nan', 'nan'.理论上，'three'列的结果应该是：'c'、'd'、'e'、'nan'、'nan'。

Thank you谢谢

Answer 1

Just set row zero as the index for convenient lookup of column one .只需设置排zero为列的方便查找索引one 。

Update: the solution now works for tuple indexes.更新：该解决方案现在适用于元组索引。

import pandas as pd
import numpy as np

df = pd.DataFrame([[0,1,2,3,4],['a','b','c','d','e'],[2,3,4,5,6]]).transpose()
df.columns = ['zero','one','two']

# set index for quick lookup    
df_indexed = df.set_index("zero")

# the indexed dataset look like this
df_indexed
Out[21]: 
     one two
zero        
0      a   2
1      b   3
2      c   4
3      d   5
4      e   6

# apply the mapping logic, taking df_indexed from outside the function
def f(el):
    return df_indexed.at[el, "one"] if el in df_indexed.index else np.nan

df["three"] = df["two"].apply(f)

print(df)
Out[18]: 
  zero one two three
0    0   a   2     c
1    1   b   3     d
2    2   c   4     e
3    3   d   5   NaN
4    4   e   6   NaN

# On the updated dataset
df
Out[71]: 
     zero one     two three
0  (0, 9)   a  (2, 9)     c
1  (1, 9)   b  (3, 9)     d
2  (2, 9)   c  (4, 9)     e
3  (3, 9)   d  (5, 9)   NaN
4  (4, 9)   e  (6, 9)   NaN

同一数据帧 Pandas Python 中的交叉引用列

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-10-13 16:03:36

同一数据帧 Pandas Python 中的交叉引用列

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-10-13 16:03:36

解决方案1
1 已采纳 2020-10-13 16:03:36