How can I multiply two columns of within a same dataframe? My dataframe looks like below image and I want to output like this. However, I cannot find how to multiply two columns that are dependent on first row of same dataframe. I would really appreciate some help on this.
request totalbytes
/login 8520
/shuttle/countdown/ 7970
/shuttle/countdown/liftoff.html 0
It seems need simply multiple columns:
df['totalbytes'] = df['bytesbytes']*df['bytesfrequency']
Or use mul
:
df['totalbytes'] = df['bytesbytes'].mul(df['bytesfrequency'])
Sample:
df = pd.DataFrame({'bytesbytes':[3985,1420,0,0],
'bytesfrequency':[2,6,2,2]})
df['totalbytes'] = df['bytesbytes']*df['bytesfrequency']
print (df)
bytesbytes bytesfrequency totalbytes
0 3985 2 7970
1 1420 6 8520
2 0 2 0
3 0 2 0
But maybe need groupby
by first column request
and use transform
for create new Series
which is multiple (both columns are converted by transform
, maybe need only one):
df = pd.DataFrame({ 'request':['a','a','b','b'],
'bytesbytes':[3985,1420,1420,0],
'bytesfrequency':[2,6,6,2]})
g = df.groupby('request')
print (g['bytesbytes'].transform('first'))
0 3985
1 3985
2 1420
3 1420
Name: bytesbytes, dtype: int64
print (g['bytesfrequency'].transform('first'))
0 2
1 2
2 6
3 6
Name: bytesfrequency, dtype: int64
df['totalbytes'] = g['bytesbytes'].transform('first')*g['bytesfrequency'].transform('first')
print (df)
bytesbytes bytesfrequency request totalbytes
0 3985 2 a 7970
1 1420 6 a 7970
2 1420 6 b 8520
3 0 2 b 8520
EDIT:
If need remove duplicates by request
column:
df = pd.DataFrame({ 'request':['a','a','b','b'],
'bytesbytes':[3985,1420,1420,0],
'bytesfrequency':[2,6,6,2]})
print (df)
bytesbytes bytesfrequency request
0 3985 2 a
1 1420 6 a
2 1420 6 b
3 0 2 b
One line solution - drop_duplicates
, multiple and last drop
columns:
df = df.drop_duplicates('request')
.assign(totalbytes=df['bytesbytes']*df['bytesfrequency'])
.drop(['bytesbytes','bytesfrequency'], axis=1)
print (df)
request totalbytes
0 a 7970
2 b 8520
df = df.drop_duplicates('request')
df['totalbytes'] = df['bytesbytes']*df['bytesfrequency']
df = df.drop(['bytesbytes','bytesfrequency'], axis=1)
print (df)
request totalbytes
0 a 7970
2 b 8520
现在,您解释了您想要什么...实际上您想删除重复项:
(df['bytesbytes']*df['bytesfrequency']).drop_duplicates()
Short way to get your posted expected results
df.drop_duplicates().set_index('request').prod(1).reset_index(name='totalbytes')
request totalbytes
0 /shuttle/countdown 7970
1 /login 8520
2 /shuttle/countdown/liftoff.html 0
Please edit your title because it's very misleading.
Also, to answer your question, pandas
has a handy drop_duplicates
method. I strongly suggest you check it out.
In a nutshell, the method literally drops all duplicate rows and returns a new DataFrame
. Optionally, you can make the method only consider certain rows - details can be found in the docs.
In your case, you could simply do:
df2 = df2.drop_duplicates()[['requests', 'totalbytes']]
Column indexing is totally optional, but I added them because I thought you wanted only those two columns in your final output.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.