Pandas: Multiplying two columns of same dataframe thats dependent on third column

Question

How can I multiply two columns of within a same dataframe? My dataframe looks like below image and I want to output like this. However, I cannot find how to multiply two columns that are dependent on first row of same dataframe. I would really appreciate some help on this.

request                            totalbytes
/login                              8520
/shuttle/countdown/                 7970
/shuttle/countdown/liftoff.html     0

So far my output is below, but how can I get unique rows.

Answer 1

It seems need simply multiple columns:

df['totalbytes'] = df['bytesbytes']*df['bytesfrequency']

Or use mul :

df['totalbytes'] = df['bytesbytes'].mul(df['bytesfrequency'])

Sample:

df = pd.DataFrame({'bytesbytes':[3985,1420,0,0],
                   'bytesfrequency':[2,6,2,2]})


df['totalbytes'] = df['bytesbytes']*df['bytesfrequency']
print (df)
   bytesbytes  bytesfrequency  totalbytes
0        3985               2        7970
1        1420               6        8520
2           0               2           0
3           0               2           0

But maybe need groupby by first column request and use transform for create new Series which is multiple (both columns are converted by transform , maybe need only one):

df = pd.DataFrame({ 'request':['a','a','b','b'],
                   'bytesbytes':[3985,1420,1420,0],
                   'bytesfrequency':[2,6,6,2]})


g = df.groupby('request')

print (g['bytesbytes'].transform('first'))
0    3985
1    3985
2    1420
3    1420
Name: bytesbytes, dtype: int64

print (g['bytesfrequency'].transform('first'))
0    2
1    2
2    6
3    6
Name: bytesfrequency, dtype: int64

df['totalbytes'] = g['bytesbytes'].transform('first')*g['bytesfrequency'].transform('first')
print (df)
   bytesbytes  bytesfrequency request  totalbytes
0        3985               2       a        7970
1        1420               6       a        7970
2        1420               6       b        8520
3           0               2       b        8520

EDIT:

If need remove duplicates by request column:

df = pd.DataFrame({ 'request':['a','a','b','b'],
                   'bytesbytes':[3985,1420,1420,0],
                   'bytesfrequency':[2,6,6,2]})

print (df)
   bytesbytes  bytesfrequency request
0        3985               2       a
1        1420               6       a
2        1420               6       b
3           0               2       b

One line solution - drop_duplicates , multiple and last drop columns:

df = df.drop_duplicates('request')
       .assign(totalbytes=df['bytesbytes']*df['bytesfrequency'])
       .drop(['bytesbytes','bytesfrequency'], axis=1)
print (df)
  request  totalbytes
0       a        7970
2       b        8520

df = df.drop_duplicates('request')
df['totalbytes'] = df['bytesbytes']*df['bytesfrequency']
df = df.drop(['bytesbytes','bytesfrequency'], axis=1)
print (df)
  request  totalbytes
0       a        7970
2       b        8520

Answer 2

现在，您解释了您想要什么...实际上您想删除重复项：

(df['bytesbytes']*df['bytesfrequency']).drop_duplicates()

Answer 3

Short way to get your posted expected results

df.drop_duplicates().set_index('request').prod(1).reset_index(name='totalbytes')

                           request  totalbytes
0               /shuttle/countdown        7970
1                           /login        8520
2  /shuttle/countdown/liftoff.html           0

Answer 4

Please edit your title because it's very misleading.

Also, to answer your question, pandas has a handy drop_duplicates method. I strongly suggest you check it out.

In a nutshell, the method literally drops all duplicate rows and returns a new DataFrame . Optionally, you can make the method only consider certain rows - details can be found in the docs.

In your case, you could simply do:

df2 = df2.drop_duplicates()[['requests', 'totalbytes']]

Column indexing is totally optional, but I added them because I thought you wanted only those two columns in your final output.

Pandas: Multiplying two columns of same dataframe thats dependent on third column

Question

4 answers

solution1
3 2017-04-03 05:10:17

solution2
1 ACCPTED 2017-04-03 05:21:03

solution3
1 2017-04-03 06:19:14

solution4
0 2017-04-03 05:34:41

Pandas: Multiplying two columns of same dataframe thats dependent on third column

Question

4 answers

solution1 3 2017-04-03 05:10:17

solution2 1 ACCPTED 2017-04-03 05:21:03

solution3 1 2017-04-03 06:19:14

solution4 0 2017-04-03 05:34:41

solution1
3 2017-04-03 05:10:17

solution2
1 ACCPTED 2017-04-03 05:21:03

solution3
1 2017-04-03 06:19:14

solution4
0 2017-04-03 05:34:41