Columns issue with Python, Pandas and Excel;;

Question

Im trying to work with an excell file using python and pandas. This file has a huge amount of columns and rows, but I will try to simplify using this example:

Name    Age  Nationality    Name1   Age1    Nationality1    Name2   Age2    Nationality2
Jane    32   Canada                     
Pedro   25   Spain                      
                            Lucas    30      Italy          
                             Ana     23      Germany            
                                                            Pedro    43      Brazil
                                                            Lucas    32      Mexico

My excel example

So, in this example, I have the columns: Name, Age and Nationality. But, I also have Name1, Age1, and Nationality1.. Since I want to filter it by its value, It wouldnt work because I would have to filter each one: Name, Name1 and Name2.

I tought that could be an option converting to different dictionaries and try to filter those dictionaries.. but considering the amount of columns and rows I guess it would take much longer.

I also tought if I coult rename the columns, but I searched and saw that it has to have unique names.. plese correct me if im wrong.

Does anyone have a solution for this? would be very helpful. thanks in advance

Answer 1

You can use bfill(axis=1) to copy the first non null value in each row to every previous column. In the first iteration of the loop all of the Name column will be successfully populated. If you set that as the index then replace all other occurrences of those names in the df with NaN, you can repeat the process on the rest of the columns and end up with what you want.

import pandas as pd
import numpy as np

df = pd.read_csv('name_age_nationality.csv')

    Name   Age Nationality  Name1  Age1 Nationality1  Name2  Age2 Nationality2
0   Jane  32.0      Canada    NaN   NaN          NaN    NaN   NaN          NaN
1  Pedro  25.0       Spain    NaN   NaN          NaN    NaN   NaN          NaN
2    NaN   NaN         NaN  Lucas  30.0        Italy    NaN   NaN          NaN
3    NaN   NaN         NaN    Ana  23.0      Germany    NaN   NaN          NaN
4    NaN   NaN         NaN    NaN   NaN          NaN  Pedro  43.0       Brazil
5    NaN   NaN         NaN    NaN   NaN          NaN  Lucas  32.0       Mexico

for x in ['Name','Age','Nationality']:
   df = df.bfill(axis=1).set_index(x)
   df = df.replace(df.index.values,np.nan).reset_index()

df[['Name','Age','Nationality']]

Output

    Name Age Nationality
0   Jane  32      Canada
1  Pedro  25       Spain
2  Lucas  30       Italy
3    Ana  23     Germany
4  Pedro  43      Brazil
5  Lucas  32      Mexico

Answer 2

You can get all column header titles into a list. Can you be more specific what final result you want?

list(my_dataframe.columns.values)

Columns issue with Python, Pandas and Excel;;

Question

2 answers

solution1
2 2020-06-17 16:41:48

solution2
0 2020-06-17 16:29:04

Columns issue with Python, Pandas and Excel;;

Question

2 answers

solution1 2 2020-06-17 16:41:48

solution2 0 2020-06-17 16:29:04

solution1
2 2020-06-17 16:41:48

solution2
0 2020-06-17 16:29:04