简体   繁体   English

从具有不同形状的列表中填充 NaN 值

[英]Filling NaN values from a list with different shape

I have a df我有一个df

df = pd.DataFrame(index = ['A','B','C','D','E'],
             columns = ['date_1','date_2','value_2','value_3','value_4'], 
             data = [['2021-06-28', '2022-05-03', 30, 40, 60],
                     ['2022-01-10', '2022-05-15', 50, 90, 70],
                     [np.nan, '2022-05-15', 40, 60, 80],
                     [np.nan,  '2022-04-28', 40, 60, 90],
                     [np.nan, '2022-06-28', 50, 60, 54]])              
    date_1              date_2          value_2     value_3     value_4
A   2021-06-28          2022-05-03      30          40          60
B   2022-01-03          2022-05-15      50          90          70
C   NaN                 2022-05-15      40          60          80
D   NaN                 2022-04-28      40          60          90
...
E   NaN                 2022-06-28      50          60          54

I am trying to fill NaN values in column date_1 .我正在尝试在date_1列中填充NaN值。 The values I need to fill the date_1 column are changing every week, the min value of date_1 value need to be 2021-06-28 and the max value is 2022-06-20 .我需要填写date_1列的值每周都在变化, date_1值的min需要为2021-06-28max2022-06-20 Each week the max value in date_1 column will be the last Monday.每周date_1列中的max将是最后一个星期一。 I need column date_1 to each date up to 2022-06-20 at least once so that each date starting from 2021-06-28 to 2022-06-20 will be in date_1 at least once.我需要至少一次到date_1的每个日期的2022-06-20列,以便从2021-06-282022-06-20的每个日期至少在date_1中出现一次。 The order of these values does not matter.这些值的顺序无关紧要。

I tried:我试过了:

from datetime import date, timedelta

today = date.today()
last_monday = pd.to_datetime((today - timedelta(days=today.weekday())  - timedelta(days=7)).strftime('%Y-%m-%d'))

# date_mappings is a dictionary with this kind of structure:
# {1 : '2021-06-28', 2 : '2021-07-05', ... 52 : '2022-06-20'}

dates_needed = [x for x in pd.to_datetime(list(date_mappings.values())) if x >= last_monday]

So now dates_needed has the remaining of the dates that needs to be added at least once in date_1 column.因此,现在dates_needed具有需要在date_1列中至少添加一次的剩余日期。

The problem I am facing is that the shapes do not match when I try to fill the values, because there can be multiple rows with the same date_2 .我面临的问题是,当我尝试填充值时形状不匹配,因为可以有多行具有相同的date_2

If I try to use:如果我尝试使用:

df.loc[df['date_1'].isna(), 'date_1'] = dates_needed

I get:我得到:

ValueError: Must have equal len keys and value when setting with an iterable ValueError:使用可迭代设置时必须具有相等的 len 键和值

Because this only works if I match the shape:因为这仅在我匹配形状时才有效:

df.loc[df['date_1'].isna(), 'date_1'] = [pd.to_datetime('2022-01-10 00:00:00'),
                                        pd.to_datetime('2022-01-17 00:00:00'),
                                        pd.to_datetime('2022-01-24 00:00:00')]
    date_1         date_2           value_2     value_3     value_4
A   2021-06-28     2022-05-03       30          40          60
B   2022-01-10     2022-05-15       50          90          70
C   2022-01-10     2022-05-15       40          60          80
D   2022-01-17     2022-04-28       40          60          90
E   2022-01-24     2022-06-28       50          60          54

So my goal is to fill NaN values in date_1 from a created list dates_needed where the each date from dates_needed is used at least once in date_1 column and the order does not matter.所以我的目标是从创建的列表dates_needed中填充date_1中的NaN值,其中 dates_needed 中的每个日期在dates_needed列中至少使用一次, date_1顺序无关紧要。

Here is solution for mapping by integers from date_mappings by helper Index by number of missing values by sum .这是通过帮助器Indexdate_mappings中的整数映射的解决方案,按sum的缺失值数。 Solution working if difference between length of dict vs number of missing values:如果 dict 的长度与缺失值的数量之间存在差异,则解决方案有效:

m = df['date_1'].isna()
df.loc[m, 'date_1'] = (pd.Index(range(m.sum())) + 1).map(date_mappings)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM