简体   繁体   English

python pandas dataframe 填充,例如 bfill、ffill

[英]python pandas dataframe filling e.g. bfill, ffill

I have two problems with filling out a very large dataframe.我在填写一个非常大的 dataframe 时遇到了两个问题。 There is a section of the picture.图片有一段。 I want the 1000 in E and F to be pulled down to 26 and no further.我希望 E 和 F 中的 1000 被拉低到 26 并且不再进一步。 In the same way I want the 2000 to be pulled up to -1 and down to the next 26. I thought I could do this with bfill and ffill, but unfortunately I don't know how...(picture1)以同样的方式,我希望将 2000 拉到 -1 并拉到下一个 26。我想我可以用 bfill 和 ffill 做到这一点,但不幸的是我不知道如何......(图片1) 在此处输入图像描述

Another problem is that columns occur in which the values from -1 to 26 do not contain any values in E and F. How can I delete or fill them with 0 so that no bfill or ffill makes wrong entries there?另一个问题是出现的列中从 -1 到 26 的值不包含 E 和 F 中的任何值。如何删除或用 0 填充它们,以便没有 bfill 或 ffill 在那里输入错误的条目? (picture2) (图二) 在此处输入图像描述

import pandas as pd
import numpy as np

data = '/Users/Hanna/Desktop/Coding/Code.csv'


df_1 = pd.read_csv(data,usecols=["A",
                           "B",
                           "C",
                           "D",
                           "E",
                           "F",
                           ],nrows=75)


base_list =[-1,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26]
df_c = pd.MultiIndex.from_product([
[4000074],
["SP000796746","SP001811642"],
[201824, 201828, 201832, 201835, 201837, 201839, 201845, 201850, 201910, 201918, 201922, 201926, 201909, 201916, 201918, 201920],
base_list],

names=["A", "B", "C", "D"]).to_frame(index=False)
df_3 = pd.merge(df_c, df_1, how='outer')

To understand it better, I have shortened the example a bit.为了更好地理解它,我稍微缩短了示例。 Picture 3 shows how it looks like when it is filled and picture 4 shows it correctly filled图 3 显示了填充后的样子,图 4 显示了正确填充时的样子在此处输入图像描述

在此处输入图像描述

could find the indexes where you have -1 and then slice/loop over the columns to fill.可以找到你有 -1 的索引,然后对要填充的列进行切片/循环。

just to create the sample data:只是为了创建示例数据:

import pandas as pd
df = pd.DataFrame(columns=list('ABE'))
df['A']=list(range(-1, 26)) * 10

add random values at each section在每个部分添加随机值

import random 

for i in df.index:
    if i%27 == 0:
        df.loc[i,'B'] = random.random()
    else:
        df.loc[i, 'B'] = 0

find the indexes to slice over找到要切片的索引

indx = df[df['A'] == -1].index.values

fill out data in column "E"填写“E”列中的数据

for i, j in zip(indx[:-1], indx[1:]):
    df.loc[i:j-1, 'E'] = df.loc[i:j-1, 'B'].max()

    if j == indx[-1]:
        df.loc[j:, 'E'] = df.loc[j:, 'B'].max()

Assuming you have to find and fill values for a particular segment.假设您必须查找并填充特定段的值。

data = pd.read_csv('/Users/Hanna/Desktop/Coding/Code.csv')    
for i in range(0,data.shape[0],27):
        if i+27 < data.shape[0]:
            data.loc[i:i+27,'E'] = max(data['E'].iloc[i:i+27])
        else:
            data.loc[i:data.shape[0],'E'] = max(data['E'].iloc[i:data.shape[0]])

you can replace the max to whatever you want.您可以将max替换为您想要的任何内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM