[英]Shift values in dataframe to the left if column name == Year and value is NaN pandas
I have a dataframe that looks like this:我有一个看起来像这样的数据框:
0 1 2018 3 2017 5
0 Population 3 NaN 418980 NaN 501433
1 British 4 31514 NaN 96797 NaN
2 French NaN 3089 NaN 201 NaN
3 NaN NaN 34603 NaN 96998 NaN
I want to end up with a dataframe that looks like this:我想最终得到一个如下所示的数据框:
0 1 2018 3 2017 5
0 Population 3 418980 NaN 501433 NaN
1 British 4 31514 NaN 96797 NaN
2 French NaN 3089 NaN 201 NaN
3 NaN NaN 34603 NaN 96998 NaN
Where the logic is: If a year column has a NaN value, look to the right for a numerical value and replace the NaN value.逻辑在哪里:如果年份列具有 NaN 值,请向右查找数值并替换 NaN 值。
I believe I need to find the index of any year column, look for df['2018'].isnull()
, if it is null, add one to the index then search for the corresponding value but am unsure if this is the best method.我相信我需要找到任何年份列的索引,查找df['2018'].isnull()
,如果它为空,则在索引中添加一个然后搜索相应的值,但我不确定这是否是最好的方法。
pandas
has a built in function for using another column to replace the NA
values in the original : pandas
有一个内置函数,用于使用另一列替换原始列中的NA
值:
df[2018] = df[2018].combine_first(df[3])
If you have many columns like that, think how to loop over the columns to use the column name and it's right-sided one's name.如果您有很多这样的列,请考虑如何遍历列以使用列名,它是右侧的名称。 (or I can help you with that) (或者我可以帮助你)
Idea is replace next values of years to years with forward filling misisng values and then use DataFrame.groupby
with axis=1
for grouping per columns and get first non missing values if exist by GroupBy.first
:想法是用前向填充缺失值替换年到年的下一个值,然后使用带有axis=1
DataFrame.groupby
对每列进行分组,如果GroupBy.first
存在,则获取第一个非缺失值:
s = df.columns.astype(str).to_series()
a = s.where(s.str.contains('\d{4}')).ffill().fillna(s)
print (a)
0 0
1 1
2018 2018
3 2018
2017 2017
5 2017
dtype: object
df1 = df.groupby(pd.Index(a), axis=1).first()
print (df1)
0 1 2017 2018
0 Population 3.0 501433.0 418980.0
1 British 4.0 96797.0 31514.0
2 French NaN 201.0 3089.0
3 NaN NaN 96998.0 34603.0
By using what @Aryerez answered with I came up with this:通过使用@Aryerez 的回答,我想出了这个:
columns_list = list(df.columns)
year_column_indexes = [i for i, item in enumerate(columns_list) if re.search('201[0-9]', item)]
for _index in year_column_indexes:
df.iloc[:, _index] = df.iloc[:, _index].combine_first(df.iloc[:, _index+1])
df = df.drop(df.columns[_index+1], axis=1)
But it needs a bit of editing.但它需要一些编辑。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.