I have a table with multiple rows, which should be grouped on the number in 1st column. In other columns there is data which I need to combine to single row.
I tried combine_first function, but don't understand why it's not working.
Im trying to make this:
df6=pd.DataFrame({'JobNumber':[647,817,915], 'Column6':['KT35','KT35','KT35'],'Column7':[1, 4, 1],
'Column8':[1.5, 1.7 ,1], 'Column9':[0,1,2.03]})
from this:
df=pd.DataFrame({'JobNumber':[647,647,817,817,817, 915,915,915],'Column6':['KT35','KT35','KT35','KT35','KT35','KT35','KT35','KT35'],
'Column7':[0, 1, 0, 0 , 4, 1, 0, 0],'Column8':[1.5, 0 ,0 ,1.7,0,0,0,1], 'Column9':[0,0,1,0,0,0,2.03,0]})
In other words I'm trying to create a line for each JobNumber with all data in one row.
I' came up with this code:
df2 = pd.read_excel(file.xlsx)
df2.columns=['JobNumber','Column6','Column7','Column8','Column9']
df3 = df2.loc[[0],:]
for i in range(len(df2.JobNumber)):
JobNum = df2.iloc[i, 0]
if df2.iloc[i,0] == df2.iloc[i-1, 0]:
df3.loc[df3.JobNumber == JobNum,:] = df3.loc[df3.JobNumber == JobNum,:].combine_first(df2.iloc[[i],:])
else:
df3.append(df2.iloc[i,:])
But combine_first line doesn't seem to work. df3.append(**) also don't work I can't understand what is wrong with my code:/ It doesn't show any error, it just looks like my loop has no effect on df3, because when i print it out it's only 1 row in it, the one i assign to it before
I am not sure the extent of this, but if it alternates between these two columns like in the example provided, the code below should work.
df['col8'] = df['col8'].shift()
df = df.dropna(subset=['col8'])
I would fill the blanks ''
with NaN
df.replace('', np.nan)
I would the .ffill()
and bfill()
at the same time
Then drop .duplicates()
See mock data and the solution below. All I have done is to chain the methods above together
Data
df=pd.DataFrame({'Column5':[647,647,817,817],'Column6':['KT35','KT35','KT35','KT35'],'Column7':['',1,'',1],'Column8':[1.5,'',2,''], 'Column9':['','','','']})
print(df)
Column5 Column6 Column7 Column8 Column9
0 647 KT35 1.5
1 647 KT35 1
2 817 KT35 2
3 817 KT35 1
df=df.replace('', np.nan).ffill().bfill().drop_duplicates(keep='first')
print(df)
Column5 Column6 Column7 Column8 Column9
0 647 KT35 1.0 1.5 NaN
2 817 KT35 1.0 2.0 NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.