在 Pandas 数据框中插入行以填补年份空白

Question

I'm having the following DataFrame:我有以下 DataFrame：

import pandas as pd

data = {'id':  ['A', 'A','B', 'C'],
        'year': [2002,2002, 2003, 2004],
        'city':['London', 'Rome','Paris', 'Berlin'],
        'appearence': [1,1,1,1]}

df = pd.DataFrame(data)

I want to fill gaps in the year column starting from 2000 till the year when appearance equals 1. The column appearance is always equal to 1 in the input DataFrame.我想填补从 2000 年到外观等于 1 的年份列中的空白。输入 DataFrame 中的列外观始终等于 1。 Please note that each ID could be in two different cities in the same year.请注意，每个 ID 可能在同一年位于两个不同的城市。

The desired output:所需的 output：

import pandas as pd

data = {'id':  ['A', 'A', 'A', 'A', 'A', 'A','B','B','B','B','C','C','C','C','C'],
        'year': [2000, 2001, 2002, 2000, 2001, 2002,2000, 2001, 2002, 2003,2000,2001,2002,2003, 2004],
        'city':['NaN', 'NaN','London','NaN', 'NaN','Rome', 'NaN', 'NaN','NaN','Paris', 'NaN', 'NaN','NaN','NaN','Berlin'],
        'appearence': [0,0,1,0,0,1,0,0,0,1,0,0,0,0,1]}

df = pd.DataFrame(data)

Answer 1

Solution for prepend years starting per 2000 , working if appearence=1 in input DataFrame:从2000开始的前置年份的解决方案，如果输入 DataFrame 中的appearence=1则工作：

f = lambda x: x.set_index('year').reindex(range(2000, x['year'].max() + 1))
df = df.groupby('id').apply(f).drop('id', axis=1).fillna({'appearence': 0}).reset_index()
print (df)
   id  year    city  appearence
0   A  2000     NaN         0.0
1   A  2001     NaN         0.0
2   A  2002  London         1.0
3   B  2000     NaN         0.0
4   B  2001     NaN         0.0
5   B  2002     NaN         0.0
6   B  2003   Paris         1.0
7   C  2000     NaN         0.0
8   C  2001     NaN         0.0
9   C  2002     NaN         0.0
10  C  2003     NaN         0.0
11  C  2004  Berlin         1.0

EDIT:编辑：

f = lambda x: x.set_index('year').reindex(range(2000, x['year'].max() + 1))
df = df.groupby([df.index, 'id']).apply(f).drop('id', axis=1).fillna({'appearence': 0}).droplevel(0).reset_index()
print (df)
   id  year    city  appearence
0   A  2000     NaN         0.0
1   A  2001     NaN         0.0
2   A  2002  London         1.0
3   A  2000     NaN         0.0
4   A  2001     NaN         0.0
5   A  2002    Rome         1.0
6   B  2000     NaN         0.0
7   B  2001     NaN         0.0
8   B  2002     NaN         0.0
9   B  2003   Paris         1.0
10  C  2000     NaN         0.0
11  C  2001     NaN         0.0
12  C  2002     NaN         0.0
13  C  2003     NaN         0.0
14  C  2004  Berlin         1.0

Answer 2

One option is to use complete from pyjanitor , to abstract the reshaping (explicitly expose missing rows):一种选择是使用pyjanitor中的complete来抽象整形（显式暴露缺失的行）：

# pip install pyjanitor
import pandas as pd
import janitor

# create dictionary for new dates
dates = {"year": lambda df: range(2000, df.max() + 1)}

# execute complete, and fill the nulls with 0
(df.complete(dates, by="id", sort=True)
   .fillna({"appearence": 0}, downcast="infer")
 )
   id  year    city  appearence
0   A  2000     NaN           0
1   A  2001     NaN           0
2   A  2002  London           1
3   B  2000     NaN           0
4   B  2001     NaN           0
5   B  2002     NaN           0
6   B  2003   Paris           1
7   C  2000     NaN           0
8   C  2001     NaN           0
9   C  2002     NaN           0
10  C  2003     NaN           0
11  C  2004  Berlin           1

Answer 3

import pandas as pd
start = 2000
data = {'id':  ['A', 'B', 'C'],
        'year': [2002, 2003, 2004],
        'city':['London', 'Paris', 'Berlin'],
        'appearence': [1,1,1]}
row1 = []
row2 = []
row3 = []
row4 = []
counter = 0;
for i in data['year']:
        for j in range(start,i+1):
                row1.append(data['id'][counter])
                row2.append(j)
                row3.append("NaN")
                row4.append(0)
        row4.pop()
        row4.append(data['appearence'][counter])
        row3.pop()
        row3.append(data['city'][counter])
        counter = counter + 1
data = {'id':  row1,
        'year': row2,
        'city':row3,
        'appearence': row4}
df = pd.DataFrame(data)

enter image description here在此处输入图像描述

在 Pandas 数据框中插入行以填补年份空白

问题描述

3 个解决方案

解决方案1
3 2022-01-31 07:11:41

解决方案2
1 2022-01-31 07:29:13

解决方案3
1 2022-01-31 07:34:00

在 Pandas 数据框中插入行以填补年份空白

问题描述

3 个解决方案

解决方案1 3 2022-01-31 07:11:41

解决方案2 1 2022-01-31 07:29:13

解决方案3 1 2022-01-31 07:34:00

解决方案1
3 2022-01-31 07:11:41

解决方案2
1 2022-01-31 07:29:13

解决方案3
1 2022-01-31 07:34:00