简体   繁体   English

根据另一列填充 pandas 中的缺失数据

[英]Fill missing data in pandas based on another column

I have had this data frame before with missing data in numbers and level3 , which are both different in dtypes (int + str).我之前有过这个数据框,其中缺少numberslevel3中的数据,它们在 dtypes (int + str) 中都不同。 And i am looking to fill the data based on the Org column.我希望根据Org列填充数据。 The data in numbers and level3 are always the same for each Org ID.对于每个Org ID, numberslevel3中的数据始终相同。

numbers = [np.nan, 5, 5, 5, np.nan,55,np.nan,55,55,np.nan,555,np.nan,555,555,np.nan]
Org = [1, 1, 1, 1, 1,2, 2, 2, 2, 2,3, 3, 3, 3, 3]
level3 = ["test", np.nan, "test", "test", np.nan, "failed", np.nan, "failed", "failed", "failed",np.nan,'try harder','try harder',np.nan,np.nan]
d = {'col1': numbers, 'col2': Org,'col3':level3}
inital = pd.DataFrame(data = d)

My desired output is the below:我想要的 output 如下:

numbers = [5, 5, 5, 5, 5,55,55,55,55,55,555,555,555,555,555]
Org = [1, 1, 1, 1, 1,2, 2, 2, 2, 2,3, 3, 3, 3, 3]
level3 = ["test", "test", "test", "test", "test", "failed", "failed", "failed", "failed", "failed",'try harder','try harder','try harder','try harder','try harder']
d = {'col1': numbers, 'col2': Org,'col3':level3}
final = pd.DataFrame(data = d)

I started by creating an extremely long loop to see if the org was the same, then applying the -1 or -2 or -3 or +1 or + 2 or +3 value if it wasn't empty.我首先创建了一个非常长的循环来查看组织是否相同,然后应用 -1 或 -2 或 -3 或 +1 或 + 2 或 +3 值(如果它不为空)。 Still, it seemed ridiculously inefficient and didn't work perfectly, so I thought id come here to see if anyone had any tricks they could teach me.尽管如此,它似乎效率低得离谱,而且效果也不理想,所以我想我来这里看看是否有人有什么技巧可以教我。

Thank you谢谢

Let's try我们试试看

inital[['col1', 'col3']] = inital.groupby('col2').apply(lambda g: g[['col1', 'col3']].ffill().bfill())
print(inital)

     col1  col2        col3
0     5.0     1        test
1     5.0     1        test
2     5.0     1        test
3     5.0     1        test
4     5.0     1        test
5    55.0     2      failed
6    55.0     2      failed
7    55.0     2      failed
8    55.0     2      failed
9    55.0     2      failed
10  555.0     3  try harder
11  555.0     3  try harder
12  555.0     3  try harder
13  555.0     3  try harder
14  555.0     3  try harder

Try Below code - You can drop not required columns:试试下面的代码 - 您可以删除不需要的列:

inital.assign(new_col3 = inital.groupby(['col2'])['col3'].transform('first'),
              new_col1 = inital.groupby(['col2'])['col1'].transform('max')
)

Output: Output:

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM