简体   繁体   English

combine_first 似乎没有任何影响 dataframe

[英]combine_first doesn't seems to have any affect dataframe

I have a table with multiple rows, which should be grouped on the number in 1st column.我有一个包含多行的表,应该按第一列中的数字分组。 In other columns there is data which I need to combine to single row.在其他列中有我需要合并到单行的数据。

I tried combine_first function, but don't understand why it's not working.我试过 combine_first function,但不明白为什么它不起作用。

Im trying to make this:我试图做到这一点:

df6=pd.DataFrame({'JobNumber':[647,817,915], 'Column6':['KT35','KT35','KT35'],'Column7':[1, 4, 1],
                 'Column8':[1.5, 1.7 ,1], 'Column9':[0,1,2.03]})

from this:由此:

df=pd.DataFrame({'JobNumber':[647,647,817,817,817, 915,915,915],'Column6':['KT35','KT35','KT35','KT35','KT35','KT35','KT35','KT35'],
                 'Column7':[0, 1, 0, 0 , 4, 1, 0, 0],'Column8':[1.5, 0 ,0 ,1.7,0,0,0,1], 'Column9':[0,0,1,0,0,0,2.03,0]})

In other words I'm trying to create a line for each JobNumber with all data in one row.换句话说,我正在尝试为每个 JobNumber 创建一行,并将所有数据放在一行中。

I' came up with this code:我想出了这个代码:

df2 = pd.read_excel(file.xlsx)
df2.columns=['JobNumber','Column6','Column7','Column8','Column9']

df3 = df2.loc[[0],:]
for i in range(len(df2.JobNumber)):
  JobNum = df2.iloc[i, 0]
  if df2.iloc[i,0] == df2.iloc[i-1, 0]:
      df3.loc[df3.JobNumber == JobNum,:] = df3.loc[df3.JobNumber == JobNum,:].combine_first(df2.iloc[[i],:])
  else:
      df3.append(df2.iloc[i,:])

But combine_first line doesn't seem to work.但是 combine_first 行似乎不起作用。 df3.append(**) also don't work I can't understand what is wrong with my code:/ It doesn't show any error, it just looks like my loop has no effect on df3, because when i print it out it's only 1 row in it, the one i assign to it before df3.append(**) 也不起作用我不明白我的代码有什么问题:/ 它没有显示任何错误,看起来我的循环对 df3 没有影响,因为当我打印它时出来它只有1行,我之前分配给它的那一行

I am not sure the extent of this, but if it alternates between these two columns like in the example provided, the code below should work.我不确定这种情况的程度,但如果它像提供的示例中那样在这两列之间交替,则下面的代码应该可以工作。

df['col8'] = df['col8'].shift()
df = df.dropna(subset=['col8'])

I would fill the blanks '' with NaN''NaN填充空白

df.replace('', np.nan)

I would the .ffill() and bfill() at the same time我会同时使用.ffill()bfill()

Then drop .duplicates()然后删除.duplicates()

See mock data and the solution below.请参阅下面的模拟数据和解决方案。 All I have done is to chain the methods above together我所做的就是将上述方法链接在一起

Data数据

    df=pd.DataFrame({'Column5':[647,647,817,817],'Column6':['KT35','KT35','KT35','KT35'],'Column7':['',1,'',1],'Column8':[1.5,'',2,''], 'Column9':['','','','']})
print(df)


    Column5 Column6 Column7 Column8 Column9
0      647    KT35             1.5        
1      647    KT35       1                
2      817    KT35               2        
3      817    KT35       1     

df=df.replace('', np.nan).ffill().bfill().drop_duplicates(keep='first')
print(df)


   Column5 Column6  Column7  Column8  Column9
0      647    KT35      1.0      1.5      NaN
2      817    KT35      1.0      2.0      NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM