简体   繁体   中英

Pandas - Add a new column extracting value from arrays based on other column value

I am currently stuck trying to extract a value from a list/array depending on values of a dataframe.

Imagine i have this array. This array i can manually create so i can put the numbers in any way i want i just thought this python list was the best one but i can do anything here

value = [[30, 120, 600, 3000], [15, 60, 300, 1500], [30, 120, 600, 3000], [10, 40, 200, 1000],[10, 40, 200, 1000], [10, 40, 200, 1000], [10, 40, 200, 1000], [5, 20, 100, 500]]

I have also a data frame that comes from much bigger/dynamic processing where I have two columns, which are int types. Here a code to recreate those 2 columns as an example. The array possible values of id1 go from 0 to 6 and of id2 go from 0 to 3

data = {'id1': [4, 2, 6, 6], 'id2': [1, 2, 3, 1]}  
df = pd.DataFrame(data) 

What i want to do is add an additional column in the dataframe df which is based on the value of the array depending on the two columns. So for example the first row of data frame will take the value of value[4][1]=40 to end up with a dataframe like this

result = {'id1': [4, 2, 6, 6], 'id2': [1, 2, 3, 1], 'matched value': [40, 600, 1000, 40]}  
dfresult = pd.DataFrame(result)

I am a bit lost on what is the best way to achieve this. What comes to my mind is a very brutal solution where what i can do is take the values of the multidimensional array and just create a single list where I have all the possible 7*4 combinations, in the data frame create a new column which is the concatenation of the two-ids and then do a straight join based on the simple condition. This would likely work in this case because the possible combinations are few but I am certain there is a learning opportunity here to use lists in a dynamic way that escapes me!

You can use list comprehension to iterate over the id pairs and retrieve the corresponding value for each pair

df['matched_val'] = [value[i][j] for i, j in zip(df['id1'], df['id2'])]

Or a better solution with numpy indexing but applicable only if the sub-lists inside value are of equal length:

df['matched_val'] = np.array(value)[df['id1'], df['id2']]

Result

   id1  id2  matched_val
0    4    1           40
1    2    2          600
2    6    3         1000
3    6    1           40

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM