简体   繁体   English

Pandas - 添加一个新列,根据其他列值从 arrays 中提取值

[英]Pandas - Add a new column extracting value from arrays based on other column value

I am currently stuck trying to extract a value from a list/array depending on values of a dataframe.我目前无法根据 dataframe 的值尝试从列表/数组中提取值。

Imagine i have this array.想象一下我有这个数组。 This array i can manually create so i can put the numbers in any way i want i just thought this python list was the best one but i can do anything here这个数组我可以手动创建,所以我可以以任何我想要的方式放置数字我只是认为这个 python 列表是最好的但我可以在这里做任何事情

value = [[30, 120, 600, 3000], [15, 60, 300, 1500], [30, 120, 600, 3000], [10, 40, 200, 1000],[10, 40, 200, 1000], [10, 40, 200, 1000], [10, 40, 200, 1000], [5, 20, 100, 500]]

I have also a data frame that comes from much bigger/dynamic processing where I have two columns, which are int types.我还有一个来自更大/动态处理的数据框,其中我有两列,它们是 int 类型。 Here a code to recreate those 2 columns as an example.这里以重新创建这两列的代码为例。 The array possible values of id1 go from 0 to 6 and of id2 go from 0 to 3 id1 go 从 0 到 6 和 id2 go 从 0 到 3 的数组可能值

data = {'id1': [4, 2, 6, 6], 'id2': [1, 2, 3, 1]}  
df = pd.DataFrame(data) 

What i want to do is add an additional column in the dataframe df which is based on the value of the array depending on the two columns.我想要做的是在 dataframe df 中添加一个附加列,该列基于数组的值,具体取决于两列。 So for example the first row of data frame will take the value of value[4][1]=40 to end up with a dataframe like this因此,例如,数据框的第一行将采用 value[4][1]=40 的值,最终得到这样的 dataframe

result = {'id1': [4, 2, 6, 6], 'id2': [1, 2, 3, 1], 'matched value': [40, 600, 1000, 40]}  
dfresult = pd.DataFrame(result)

I am a bit lost on what is the best way to achieve this.我对实现这一目标的最佳方法有点迷茫。 What comes to my mind is a very brutal solution where what i can do is take the values of the multidimensional array and just create a single list where I have all the possible 7*4 combinations, in the data frame create a new column which is the concatenation of the two-ids and then do a straight join based on the simple condition.我想到的是一个非常残酷的解决方案,我可以做的是获取多维数组的值并创建一个列表,其中我拥有所有可能的 7*4 组合,在数据框中创建一个新列,即连接两个 ID,然后根据简单条件进行直接连接。 This would likely work in this case because the possible combinations are few but I am certain there is a learning opportunity here to use lists in a dynamic way that escapes me!这在这种情况下可能会起作用,因为可能的组合很少,但我确信这里有一个学习机会以一种让我逃避的动态方式使用列表!

You can use list comprehension to iterate over the id pairs and retrieve the corresponding value for each pair您可以使用列表理解来遍历 id 对并检索每对的相应值

df['matched_val'] = [value[i][j] for i, j in zip(df['id1'], df['id2'])]

Or a better solution with numpy indexing but applicable only if the sub-lists inside value are of equal length:或者使用 numpy 索引的更好解决方案,但仅当value内的子列表长度相等时才适用:

df['matched_val'] = np.array(value)[df['id1'], df['id2']]

Result结果

   id1  id2  matched_val
0    4    1           40
1    2    2          600
2    6    3         1000
3    6    1           40

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM