[英]Filling NaN values from a list with different shape
I have a df
我有一个df
df = pd.DataFrame(index = ['A','B','C','D','E'],
columns = ['date_1','date_2','value_2','value_3','value_4'],
data = [['2021-06-28', '2022-05-03', 30, 40, 60],
['2022-01-10', '2022-05-15', 50, 90, 70],
[np.nan, '2022-05-15', 40, 60, 80],
[np.nan, '2022-04-28', 40, 60, 90],
[np.nan, '2022-06-28', 50, 60, 54]])
date_1 date_2 value_2 value_3 value_4
A 2021-06-28 2022-05-03 30 40 60
B 2022-01-03 2022-05-15 50 90 70
C NaN 2022-05-15 40 60 80
D NaN 2022-04-28 40 60 90
...
E NaN 2022-06-28 50 60 54
I am trying to fill NaN
values in column date_1
.我正在尝试在date_1
列中填充NaN
值。 The values I need to fill the date_1
column are changing every week, the min
value of date_1
value need to be 2021-06-28
and the max
value is 2022-06-20
.我需要填写date_1
列的值每周都在变化, date_1
值的min
需要为2021-06-28
, max
为2022-06-20
。 Each week the max
value in date_1
column will be the last Monday.每周date_1
列中的max
将是最后一个星期一。 I need column date_1
to each date up to 2022-06-20
at least once so that each date starting from 2021-06-28
to 2022-06-20
will be in date_1
at least once.我需要至少一次到date_1
的每个日期的2022-06-20
列,以便从2021-06-28
到2022-06-20
的每个日期至少在date_1
中出现一次。 The order of these values does not matter.这些值的顺序无关紧要。
I tried:我试过了:
from datetime import date, timedelta
today = date.today()
last_monday = pd.to_datetime((today - timedelta(days=today.weekday()) - timedelta(days=7)).strftime('%Y-%m-%d'))
# date_mappings is a dictionary with this kind of structure:
# {1 : '2021-06-28', 2 : '2021-07-05', ... 52 : '2022-06-20'}
dates_needed = [x for x in pd.to_datetime(list(date_mappings.values())) if x >= last_monday]
So now dates_needed
has the remaining of the dates that needs to be added at least once in date_1
column.因此,现在dates_needed
具有需要在date_1
列中至少添加一次的剩余日期。
The problem I am facing is that the shapes do not match when I try to fill the values, because there can be multiple rows with the same date_2
.我面临的问题是,当我尝试填充值时形状不匹配,因为可以有多行具有相同的date_2
。
If I try to use:如果我尝试使用:
df.loc[df['date_1'].isna(), 'date_1'] = dates_needed
I get:我得到:
ValueError: Must have equal len keys and value when setting with an iterable ValueError:使用可迭代设置时必须具有相等的 len 键和值
Because this only works if I match the shape:因为这仅在我匹配形状时才有效:
df.loc[df['date_1'].isna(), 'date_1'] = [pd.to_datetime('2022-01-10 00:00:00'),
pd.to_datetime('2022-01-17 00:00:00'),
pd.to_datetime('2022-01-24 00:00:00')]
date_1 date_2 value_2 value_3 value_4
A 2021-06-28 2022-05-03 30 40 60
B 2022-01-10 2022-05-15 50 90 70
C 2022-01-10 2022-05-15 40 60 80
D 2022-01-17 2022-04-28 40 60 90
E 2022-01-24 2022-06-28 50 60 54
So my goal is to fill NaN
values in date_1
from a created list dates_needed
where the each date from dates_needed
is used at least once in date_1
column and the order does not matter.所以我的目标是从创建的列表dates_needed
中填充date_1
中的NaN
值,其中 dates_needed 中的每个日期在dates_needed
列中至少使用一次, date_1
顺序无关紧要。
Here is solution for mapping by integers from date_mappings
by helper Index
by number of missing values by sum
.这是通过帮助器Index
从date_mappings
中的整数映射的解决方案,按sum
的缺失值数。 Solution working if difference between length of dict vs number of missing values:如果 dict 的长度与缺失值的数量之间存在差异,则解决方案有效:
m = df['date_1'].isna()
df.loc[m, 'date_1'] = (pd.Index(range(m.sum())) + 1).map(date_mappings)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.