Calculate the price of Items in stored in list form

Question

I have a Dataframe which looks like this:Item-table

Date.    Item.     
10-sep.  X,Y,Z
11-sep.  Y,Z
12-sep.  Z
13-sep.  Z,X

And another Table where price of each item is stored date wise. Price-table

Item.   10sep.  11sep.   12sep.  13sep
X.       10.     5.        10.      15
Y.        7.     15.       13.       10
Z.        5.      10.       10.      10

I want my output to look like this:

Date.   Item.    Total Price
10 sep.  X,Y,Z.   22
11 sep.  Y,Z.     25
12 sep.  Z.       10
13 sep.  Z,X.     25

In first row total ptice is 22 because Price of X,Y and Z on 10 sep is 10,7 and 5 respectively. May i know how i can get this output column.

Answer 1

I am going to use this dataframes to solve your problem

print(df1)
     Date          Item      
0  10-sep         X,Y,Z 
1  11-sep           Y,Z 
2  12-sep             Z 
3  13-sep           Z,X 

print(df2)
  Item     10sep    11sep     12sep    13sep
0    X        10        5        10       15
1    Y         7       15        13       10
2    Z         5       10        10       10

We can use DataFrame.lookup to select the values of the data frame 2, but first we must prepare the values to do the search:

df3=df1.copy()
df3['Item']=df3['Item'].str.split(',')
df3=df3.explode('Item')
df3['Date']=df3['Date'].str.replace('-','')
print(df3)

    Date Item
0  10sep    X
0  10sep    Y
0  10sep    Z
1  11sep    Y
1  11sep    Z
2  12sep    Z
3  13sep    Z
3  13sep    X

mapper=df2.set_index('Item')


print(mapper)
      10sep  11sep  12sep  13sep
Item                            
X        10      5     10     15
Y         7     15     13     10
Z         5     10     10     10

df3['value']=mapper.lookup(df3['Item'],df3['Date'])
df1['Total Price']=df3.groupby(level=0).value.sum()
print(df1)
     Date          Item  Total Price
0  10-sep         X,Y,Z           22
1  11-sep           Y,Z           25
2  12-sep             Z           10
3  13-sep           Z,X           25

Time comparison for this dataframes:

method of Valdi_Bo:

%%timeit
ItemPrice = Prices.set_index('Item').stack().swaplevel().rename('Price')
def totalPrice(row):
    dat = row.Date
    items = row.Item.split(',')
    ind = pd.MultiIndex.from_arrays([[dat] * len(items), items])
    return ItemPrice.reindex(ind).sum()
Items['Total Price'] = Items.apply(totalPrice, axis=1)
13.5 ms ± 699 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

my method

%%timeit
df3=Items.copy()
df3['Item']=df3['Item'].str.split(',')
df3=df3.explode('Item')
mapper=Prices.set_index('Item')
df3['value']=mapper.lookup(df3['Item'],df3['Date'])
Items['Total Price']=df3.groupby(level=0).value.sum()
7.68 ms ± 178 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

@anky_91 method

%%timeit
m=df2.set_index('Item').T
n=df1[['Date']].assign(**df1['Item'].str.get_dummies(',')).set_index('Date')
final=df1.set_index('Date').assign(Total_Price=m.mul(n).sum(1)).reset_index()
8.7 ms ± 199 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Answer 2

I assumed some minimal order and coordination between your both DataFrames, ie:

There are no trailing dots is column names.
Date format in column names in Prices is just like in Date column in Items (they can be of string type, but both of them have a minus after the day number.

So Items and Prices Dataframes are actually as follows:

     Date   Item
0  10-sep  X,Y,Z
1  11-sep    Y,Z
2  12-sep      Z
3  13-sep    Z,X

  Item  10-sep  11-sep  12-sep  13-sep
0    X      10       5      10      15
1    Y       7      15      13      10
2    Z       5      10      10      10

The first step is to convert Prices into a Series :

ItemPrice = Prices.set_index('Item').stack().swaplevel().rename('Price')

so that it contains:

        Item
10-sep  X       10
11-sep  X        5
12-sep  X       10
13-sep  X       15
10-sep  Y        7
11-sep  Y       15
12-sep  Y       13
13-sep  Y       10
10-sep  Z        5
11-sep  Z       10
12-sep  Z       10
13-sep  Z       10
Name: Price, dtype: int64

Then define a function to compute a total price:

def totalPrice(row):
    dat = row.Date
    items = row.Item.split(',')
    ind = pd.MultiIndex.from_arrays([[dat] * len(items), items])
    return ItemPrice.reindex(ind).sum()

And the last step is to apply this function to each row and save the result as a new column:

Items['Total Price'] = Items.apply(totalPrice, axis=1)

The result is:

     Date   Item  Total Price
0  10-sep  X,Y,Z           22
1  11-sep    Y,Z           25
2  12-sep      Z           10
3  13-sep    Z,X           25

Answer 3

Taking the cleaned data courtesy @Valdi_Bo, you can also try get dummies and multiply with the transposed dataframe and sum on axis=1 to get your desired output:

m=df2.set_index('Item').T
n=df1[['Date']].assign(**df1['Item'].str.get_dummies(',')).set_index('Date')
final=df1.set_index('Date').assign(Total_Price=m.mul(n).sum(1))

print(final)

         Item  Total_Price
Date                      
10-sep  X,Y,Z           22
11-sep    Y,Z           25
12-sep      Z           10
13-sep    Z,X           25

Calculate the price of Items in stored in list form

Question

3 answers

solution1
2 ACCPTED 2019-11-17 10:46:12

solution2
2 2019-11-17 11:54:11

solution3
2 2019-11-17 12:14:20

Calculate the price of Items in stored in list form

Question

3 answers

solution1 2 ACCPTED 2019-11-17 10:46:12

solution2 2 2019-11-17 11:54:11

solution3 2 2019-11-17 12:14:20

solution1
2 ACCPTED 2019-11-17 10:46:12

solution2
2 2019-11-17 11:54:11

solution3
2 2019-11-17 12:14:20