简体   繁体   English

在 Pandas 数据框中插入行以填补年份空白

[英]Insert rows to fill years gap in Pandas data frame

I'm having the following DataFrame:我有以下 DataFrame:

import pandas as pd

data = {'id':  ['A', 'A','B', 'C'],
        'year': [2002,2002, 2003, 2004],
        'city':['London', 'Rome','Paris', 'Berlin'],
        'appearence': [1,1,1,1]}

df = pd.DataFrame(data)

I want to fill gaps in the year column starting from 2000 till the year when appearance equals 1. The column appearance is always equal to 1 in the input DataFrame.我想填补从 2000 年到外观等于 1 的年份列中的空白。输入 DataFrame 中的列外观始终等于 1。 Please note that each ID could be in two different cities in the same year.请注意,每个 ID 可能在同一年位于两个不同的城市。

The desired output:所需的 output:

import pandas as pd

data = {'id':  ['A', 'A', 'A', 'A', 'A', 'A','B','B','B','B','C','C','C','C','C'],
        'year': [2000, 2001, 2002, 2000, 2001, 2002,2000, 2001, 2002, 2003,2000,2001,2002,2003, 2004],
        'city':['NaN', 'NaN','London','NaN', 'NaN','Rome', 'NaN', 'NaN','NaN','Paris', 'NaN', 'NaN','NaN','NaN','Berlin'],
        'appearence': [0,0,1,0,0,1,0,0,0,1,0,0,0,0,1]}

df = pd.DataFrame(data)

Solution for prepend years starting per 2000 , working if appearence=1 in input DataFrame:2000开始的前置年份的解决方案,如果输入 DataFrame 中的appearence=1则工作:

f = lambda x: x.set_index('year').reindex(range(2000, x['year'].max() + 1))
df = df.groupby('id').apply(f).drop('id', axis=1).fillna({'appearence': 0}).reset_index()
print (df)
   id  year    city  appearence
0   A  2000     NaN         0.0
1   A  2001     NaN         0.0
2   A  2002  London         1.0
3   B  2000     NaN         0.0
4   B  2001     NaN         0.0
5   B  2002     NaN         0.0
6   B  2003   Paris         1.0
7   C  2000     NaN         0.0
8   C  2001     NaN         0.0
9   C  2002     NaN         0.0
10  C  2003     NaN         0.0
11  C  2004  Berlin         1.0

EDIT:编辑:

f = lambda x: x.set_index('year').reindex(range(2000, x['year'].max() + 1))
df = df.groupby([df.index, 'id']).apply(f).drop('id', axis=1).fillna({'appearence': 0}).droplevel(0).reset_index()
print (df)
   id  year    city  appearence
0   A  2000     NaN         0.0
1   A  2001     NaN         0.0
2   A  2002  London         1.0
3   A  2000     NaN         0.0
4   A  2001     NaN         0.0
5   A  2002    Rome         1.0
6   B  2000     NaN         0.0
7   B  2001     NaN         0.0
8   B  2002     NaN         0.0
9   B  2003   Paris         1.0
10  C  2000     NaN         0.0
11  C  2001     NaN         0.0
12  C  2002     NaN         0.0
13  C  2003     NaN         0.0
14  C  2004  Berlin         1.0

One option is to use complete from pyjanitor , to abstract the reshaping (explicitly expose missing rows):一种选择是使用pyjanitor中的complete来抽象整形(显式暴露缺失的行):

# pip install pyjanitor
import pandas as pd
import janitor

# create dictionary for new dates
dates = {"year": lambda df: range(2000, df.max() + 1)}

# execute complete, and fill the nulls with 0
(df.complete(dates, by="id", sort=True)
   .fillna({"appearence": 0}, downcast="infer")
 )
   id  year    city  appearence
0   A  2000     NaN           0
1   A  2001     NaN           0
2   A  2002  London           1
3   B  2000     NaN           0
4   B  2001     NaN           0
5   B  2002     NaN           0
6   B  2003   Paris           1
7   C  2000     NaN           0
8   C  2001     NaN           0
9   C  2002     NaN           0
10  C  2003     NaN           0
11  C  2004  Berlin           1
import pandas as pd
start = 2000
data = {'id':  ['A', 'B', 'C'],
        'year': [2002, 2003, 2004],
        'city':['London', 'Paris', 'Berlin'],
        'appearence': [1,1,1]}
row1 = []
row2 = []
row3 = []
row4 = []
counter = 0;
for i in data['year']:
        for j in range(start,i+1):
                row1.append(data['id'][counter])
                row2.append(j)
                row3.append("NaN")
                row4.append(0)
        row4.pop()
        row4.append(data['appearence'][counter])
        row3.pop()
        row3.append(data['city'][counter])
        counter = counter + 1
data = {'id':  row1,
        'year': row2,
        'city':row3,
        'appearence': row4}
df = pd.DataFrame(data)

enter image description here在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM