[英]How to replicate rows based on value of a column in same pandas dataframe
[英]replicate dataframe rows based on a column's value
我有一个 dataframe 的 2 个月数据,有 20 列,其中一列是'date'
。 有 3 个不连续的日期没有数据。 我想复制前一天的数据来为那些缺失的日子创建条目。
这是我尝试过的:
df_replicate=df[(df['date']=='2021-07-27') | (df['date']=='2021-08-18') | (df['date']=='2021-08-22')]
df_replicate.loc[df_replicate['date']=='2021-07-27']='2021-07-28'
df_replicate.loc[df_replicate['date']=='2021-08-18']='2021-08-19'
df_replicate.loc[df_replicate['date']=='2021-08-22']='2021-08-23'
然后连接df
和df_replicate
什么是更简单的方法来做到这一点?
您可以将 reindex 与“ffill”参数一起使用:
import pandas as pd
import numpy as np
date_index = pd.date_range('2021-07-27', periods=7, freq='D')
# set data to date time index
df = pd.DataFrame({"prices": [100, np.nan, 100, 89, 88, np.nan, np.nan ]},index=date_index)
2021-07-27 100.0
2021-07-28 NaN
2021-07-29 100.0
2021-07-30 89.0
2021-07-31 88.0
2021-08-01 NaN
2021-08-02 NaN
# remove one of the date values to represent missing data
df = df[~(df.index=='2021-07-28')]
2021-07-27 100.0
2021-07-29 100.0
2021-07-30 89.0
2021-07-31 88.0
2021-08-01 NaN
2021-08-02 NaN
# Second date index with correct number of days
date_index2 = pd.date_range('2021-07-27', periods=7, freq='D')
# df with missing row foward filled
df.reindex(date_index2, method="ffill")
2021-07-27 100.0 #This value is carried to the next date
2021-07-28 100.0
2021-07-29 100.0
2021-07-30 89.0
2021-07-31 88.0
2021-08-01 NaN
2021-08-02 NaN
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reindex.html#pandas.DataFrame.reindex
使用由Index.shift
创建的下一个值按列表过滤匹配的行,并为这些对使用前向填充缺失值:
df = pd.DataFrame({"prices": [100, np.nan, 100, 89, 88, np.nan, np.nan ],
'date': pd.date_range('2021-07-27', periods=7, freq='D')})
df['date'] = pd.to_datetime(df['date'])
dates = pd.to_datetime(['2021-07-27','2021-08-18','2021-08-22'])
mask = df['date'].isin(dates.append(dates.shift(freq='d')))
df[mask] = df[mask].ffill()
print (df)
prices date
0 100.0 2021-07-27
1 100.0 2021-07-28
2 100.0 2021-07-29
3 89.0 2021-07-30
4 88.0 2021-07-31
5 NaN 2021-08-01
6 NaN 2021-08-02
如果只需要用所有以前的非 NaN 替换下一行(由 NaN 填充):
df['date'] = pd.to_datetime(df['date'])
dates = pd.to_datetime(['2021-07-27','2021-08-18','2021-08-22'])
mask = df['date'].isin(dates.shift(freq='d'))
df[mask] = df.ffill()
如果输入列表不同,则下一个值 ( ['2021-07-28','2021-08-19','2021-08-23']
) 必须为之前的匹配值移动:
df['date'] = pd.to_datetime(df['date'])
dates = pd.to_datetime(['2021-07-28','2021-08-19','2021-08-23'])
mask = df['date'].isin(dates.append(dates.shift(-1, freq='d')))
df[mask] = df[mask].ffill()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.