简体   繁体   中英

a more pythonic way to split a column in multiple columns and sum two of them

Sample code:

import pandas as pd
df = pd.DataFrame({'id': [1, 2, 3], 'bbox': [[1.0, 2.0, 3.0, 4.0], [5.0, 6.0, 7.0, 8.0], [9.0, 10.0, 11.0, 12.0]]})

Goal:

df = pd.DataFrame({'id': [1, 2, 3], 'bbox': [[1.0, 2.0, 3.0, 4.0], [5.0, 6.0, 7.0, 8.0], [9.0, 10.0, 11.0, 12.0]], 'x1': [1, 5, 9], 'y1': [2, 6, 10], 'x2': [4, 12, 20], 'y2': [6, 14, 22]})

In words, I want to add four integer columns to the dataframe, where the first two are just the first two elements of each list in bbox , and the last two are respectively the sum of the first and third element of each list, and the sum of the second and fourth one. Currently, I do this:

df[['x1', 'y1', 'w', 'h']] = pd.DataFrame(df['bbox'].values.tolist(), index=df.index).astype(int)
df.assign(x2 = df['x1']+df['w'], y2 = df['y1']+df['h'])
df.drop(['w', 'h'], axis = 1) 

It seems a bit convoluted to me. Isn't there a way to avoid creating the intermediate columns w and h , or would it make the code less readable? Readability is an higher priority for me than saving one code line, thus if there are no readable alternatives, I'll settle for this solution.

I think you can create x2 and y2 in first step:

df1 = pd.DataFrame(df['bbox'].values.tolist(),index=df.index).astype(int)
df[['x1', 'y1', 'x2', 'y2']] = df1
df = df.assign(x2 = df['x1']+df['x2'], y2 = df['y1']+df['y2'])

print (df)
   id                     bbox  x1  y1  x2  y2
0   1     [1.0, 2.0, 3.0, 4.0]   1   2   4   6
1   2     [5.0, 6.0, 7.0, 8.0]   5   6  12  14
2   3  [9.0, 10.0, 11.0, 12.0]   9  10  20  22

Or use += :

df1 = pd.DataFrame(df['bbox'].values.tolist(),index=df.index).astype(int)
df[['x1', 'y1', 'x2', 'y2']] = df1
df['x2'] += df['x1']
df['y2'] += df['y1']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM