如果列名 == 年份且值为 NaN 熊猫，则将数据框中的值向左移动

Question

I have a dataframe that looks like this:我有一个看起来像这样的数据框：

                              0    1   2018       3   2017       5
0                    Population    3    NaN  418980    NaN  501433
1                       British    4  31514     NaN  96797     NaN
2                        French  NaN   3089     NaN    201     NaN
3                           NaN  NaN  34603     NaN  96998     NaN

I want to end up with a dataframe that looks like this:我想最终得到一个如下所示的数据框：

                              0    1   2018       3   2017       5
0                    Population    3  418980    NaN  501433    NaN
1                       British    4  31514     NaN  96797     NaN
2                        French  NaN   3089     NaN    201     NaN
3                           NaN  NaN  34603     NaN  96998     NaN

Where the logic is: If a year column has a NaN value, look to the right for a numerical value and replace the NaN value.逻辑在哪里：如果年份列具有 NaN 值，请向右查找数值并替换 NaN 值。

I believe I need to find the index of any year column, look for df['2018'].isnull() , if it is null, add one to the index then search for the corresponding value but am unsure if this is the best method.我相信我需要找到任何年份列的索引，查找df['2018'].isnull() ，如果它为空，则在索引中添加一个然后搜索相应的值，但我不确定这是否是最好的方法。

Answer 1

pandas has a built in function for using another column to replace the NA values in the original : pandas有一个内置函数，用于使用另一列替换原始列中的NA值：

df[2018] = df[2018].combine_first(df[3])

If you have many columns like that, think how to loop over the columns to use the column name and it's right-sided one's name.如果您有很多这样的列，请考虑如何遍历列以使用列名，它是右侧的名称。 (or I can help you with that) （或者我可以帮助你）

Answer 2

Idea is replace next values of years to years with forward filling misisng values and then use DataFrame.groupby with axis=1 for grouping per columns and get first non missing values if exist by GroupBy.first :想法是用前向填充缺失值替换年到年的下一个值，然后使用带有axis=1 DataFrame.groupby对每列进行分组，如果GroupBy.first存在，则获取第一个非缺失值：

s = df.columns.astype(str).to_series()
a = s.where(s.str.contains('\d{4}')).ffill().fillna(s)
print (a)
0          0
1          1
2018    2018
3       2018
2017    2017
5       2017
dtype: object

df1 = df.groupby(pd.Index(a), axis=1).first()
print (df1)
         0     1         2017      2018
0  Population   3.0  501433.0  418980.0
1     British   4.0   96797.0   31514.0
2      French   NaN     201.0    3089.0
3         NaN   NaN   96998.0   34603.0

Answer 3

By using what @Aryerez answered with I came up with this:通过使用@Aryerez 的回答，我想出了这个：

columns_list = list(df.columns) 
year_column_indexes = [i for i, item in enumerate(columns_list) if re.search('201[0-9]', item)]
for _index in year_column_indexes:
    df.iloc[:, _index] = df.iloc[:, _index].combine_first(df.iloc[:, _index+1])
    df = df.drop(df.columns[_index+1], axis=1)

But it needs a bit of editing.但它需要一些编辑。

如果列名 == 年份且值为 NaN 熊猫，则将数据框中的值向左移动

问题描述

3 个解决方案

解决方案1
2 2019-11-28 12:46:25

解决方案2
1 已采纳 2019-11-28 13:24:02

解决方案3
0 2019-11-28 13:20:35

如果列名 == 年份且值为 NaN 熊猫，则将数据框中的值向左移动

问题描述

3 个解决方案

解决方案1 2 2019-11-28 12:46:25

解决方案2 1 已采纳 2019-11-28 13:24:02

解决方案3 0 2019-11-28 13:20:35

解决方案1
2 2019-11-28 12:46:25

解决方案2
1 已采纳 2019-11-28 13:24:02

解决方案3
0 2019-11-28 13:20:35