简体   繁体   English

如果列名 == 年份且值为 NaN 熊猫,则将数据框中的值向左移动

[英]Shift values in dataframe to the left if column name == Year and value is NaN pandas

I have a dataframe that looks like this:我有一个看起来像这样的数据框:

                              0    1   2018       3   2017       5
0                    Population    3    NaN  418980    NaN  501433
1                       British    4  31514     NaN  96797     NaN
2                        French  NaN   3089     NaN    201     NaN
3                           NaN  NaN  34603     NaN  96998     NaN

I want to end up with a dataframe that looks like this:我想最终得到一个如下所示的数据框:

                              0    1   2018       3   2017       5
0                    Population    3  418980    NaN  501433    NaN
1                       British    4  31514     NaN  96797     NaN
2                        French  NaN   3089     NaN    201     NaN
3                           NaN  NaN  34603     NaN  96998     NaN

Where the logic is: If a year column has a NaN value, look to the right for a numerical value and replace the NaN value.逻辑在哪里:如果年份列具有 NaN 值,请向右查找数值并替换 NaN 值。

I believe I need to find the index of any year column, look for df['2018'].isnull() , if it is null, add one to the index then search for the corresponding value but am unsure if this is the best method.我相信我需要找到任何年份列的索引,查找df['2018'].isnull() ,如果它为空,则在索引中添加一个然后搜索相应的值,但我不确定这是否是最好的方法。

pandas has a built in function for using another column to replace the NA values in the original : pandas有一个内置函数,用于使用另一列替换原始列中的NA值:

df[2018] = df[2018].combine_first(df[3])

If you have many columns like that, think how to loop over the columns to use the column name and it's right-sided one's name.如果您有很多这样的列,请考虑如何遍历列以使用列名,它是右侧的名称。 (or I can help you with that) (或者我可以帮助你)

Idea is replace next values of years to years with forward filling misisng values and then use DataFrame.groupby with axis=1 for grouping per columns and get first non missing values if exist by GroupBy.first :想法是用前向填充缺失值替换年到年的下一个值,然后使用带有axis=1 DataFrame.groupby对每列进行分组,如果GroupBy.first存在,则获取第一个非缺失值:

s = df.columns.astype(str).to_series()
a = s.where(s.str.contains('\d{4}')).ffill().fillna(s)
print (a)
0          0
1          1
2018    2018
3       2018
2017    2017
5       2017
dtype: object

df1 = df.groupby(pd.Index(a), axis=1).first()
print (df1)
         0     1         2017      2018
0  Population   3.0  501433.0  418980.0
1     British   4.0   96797.0   31514.0
2      French   NaN     201.0    3089.0
3         NaN   NaN   96998.0   34603.0

By using what @Aryerez answered with I came up with this:通过使用@Aryerez 的回答,我想出了这个:

columns_list = list(df.columns) 
year_column_indexes = [i for i, item in enumerate(columns_list) if re.search('201[0-9]', item)]
for _index in year_column_indexes:
    df.iloc[:, _index] = df.iloc[:, _index].combine_first(df.iloc[:, _index+1])
    df = df.drop(df.columns[_index+1], axis=1)

But it needs a bit of editing.但它需要一些编辑。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM