I have this Dataframe, which is null values that haven't been populated right.
Unidad Precio Combustible Año_del_vehiculo Caballos \
49 1 1000 Gasolina 1998.0 50.0
63 1 800 Gasolina 1998.0 50.0
88 1 600 Gasolina 1999.0 54.0
107 1 3100 Diésel 2008.0 54.0
244 1 2000 Diésel 1995.0 60.0
... ... ... ... ... ...
46609 1 47795 Gasolina 2016.0 420.0
46770 1 26900 Gasolina 2011.0 450.0
46936 1 19900 Gasolina 2007.0 510.0
46941 1 24500 Gasolina 2006.0 514.0
47128 1 79600 Gasolina 2017.0 612.0
Comunidad_autonoma Marca_y_Modelo Año_Venta Año_Comunidad \
49 Islas Baleares CITROEN AX 2020 2020Islas Baleares
63 Islas Baleares SEAT Arosa 2021 2021Islas Baleares
88 Islas Baleares FIAT Seicento 2020 2020Islas Baleares
107 La Rioja TOYOTA Aygo 2020 2020La Rioja
244 Aragón PEUGEOT 205 2019 2019Aragón
... ... ... ... ...
46609 La Rioja PORSCHE Cayenne 2020 2020La Rioja
46770 Cataluña AUDI RS5 2020 2020Cataluña
46936 Islas Baleares MERCEDES-BENZ Clase M 2020 2020Islas Baleares
46941 La Rioja MERCEDES-BENZ Clase E 2020 2020La Rioja
47128 Islas Baleares MERCEDES-BENZ Clase E 2021 2021Islas Baleares
Fecha Año Super_95 Diesel Comunidad Salario en euros anuales
49 2020-12-01 NaN NaN NaN NaN NaN
63 2021-01-01 NaN NaN NaN NaN NaN
88 2020-12-01 NaN NaN NaN NaN NaN
107 2020-12-01 NaN NaN NaN NaN NaN
244 2019-03-01 NaN NaN NaN NaN NaN
... ... ... ... ... ... ...
46609 2020-12-01 NaN NaN NaN NaN NaN
46770 2020-07-01 NaN NaN NaN NaN NaN
46936 2020-10-01 NaN NaN NaN NaN NaN
46941 2020-11-01 NaN NaN NaN NaN NaN
47128 2021-01-01 NaN NaN NaN NaN NaN
I need to fill the gasoline, diesel and salary tables with the values of the following:
Año Super_95 Diesel Comunidad Año_Comunidad Fecha \
0 2020 1.321750 1.246000 Navarra 2020Navarra 2020-01-01
1 2020 1.301000 1.207250 Navarra 2020Navarra 2020-02-01
2 2020 1.224800 1.126200 Navarra 2020Navarra 2020-03-01
3 2020 1.106667 1.020000 Navarra 2020Navarra 2020-04-01
4 2020 1.078750 0.986250 Navarra 2020Navarra 2020-05-01
.. ... ... ... ... ... ...
386 2021 1.416600 1.265000 La rioja 2021La rioja 2021-08-01
387 2021 1.431000 1.277000 La rioja 2021La rioja 2021-09-01
388 2021 1.474000 1.344000 La rioja 2021La rioja 2021-10-01
389 2021 1.510200 1.382000 La rioja 2021La rioja 2021-11-01
390 2021 1.481333 1.348667 La rioja 2021La rioja 2021-12-01
Salario en euros anuales
0 27.995,96
1 27.995,96
2 27.995,96
3 27.995,96
4 27.995,96
.. ...
386 21.535,29
387 21.535,29
388 21.535,29
389 21.535,29
390 21.535,29
It would fill the columns of the first with the second when the year_community table matches. for example in the nan where 2020Islas Baleares appears in the same row. fill in with the value of the price of gasoline from the other table where 2020Islas Baleares appears in the same row. In the case that it is 2020aragon, it would be with 2020 aragon and so on. I had thought of something like this:
analisis['Super_95'].fillna(analisis2['Super_95'].apply(lambda x: x if x=='2020Islas Baleares' else np.nan), inplace=True)
the second dataframe is the result of doing a merge, and those null values have not worked
df1.merge(df2, on='Año_Comunidad')
As a result you'll have one DataFrame where columns with same names will have a suffix _x for first DataFrame and _y for the second one.
Now to fill in the blanks you can do this for each column:
df1.loc[df1["Año_x"].isnull(),'Año_x'] = df1["Año_y"]
If a row in Año is empty, it will be filled with data from second table that we merged earlier.
You can do it in a cycle for all the columns:
cols = ['Año', 'Super_95', 'Diesel', 'Comunidad', 'Salario en euros anuales']
for col in cols:
df1.loc[df1[col+"_x"].isnull(), col+'_x'] = df1[col+'_y']
And finally you can drop the merged columns:
for col in cols:
df1 = df1.drop(col+'_y', axis=1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.