I am trying to impute the NaN's in a column with the values present in same column but I cannot figure out how to map them using another column.
I have two pandas DataFrames, the first one(df) has all the values and looks like this:
|Sr. No| Fares| Route |
|------|------|------------|
| 1 | 123 |ABE-PGD-ABE |
| 2 | 456 |ABQ-SLC-ABQ |
| 3 | 789 |ALB-SJU-ALB |
The second DataFrame(df1) looks like this:
|Sr. No| Fares| Route |
|------|------|------------|
| 130 | NaN |ABE-PGD-ABE |
| 297 | NaN |ABQ-SLC-ABQ |
| 345 | NaN |ALB-SJU-ALB |
Now I want to impute the NaN in the Fares column for all the Routes that match. Also the second DataFrame is just a subset of the first one because I wanted to isolate all the NaNs in the Fare column.
Here is my code:
for i in df_1:
df[Fare] = df[Fare].map({'Nan': ''})
Please let me know what I am doing wrong, I don't know what to map it with so I have left the value for 'Nan' blank.
You have a few things going on here.
Firstly, when you iterate a DataFrame like for i in df
, you are actually iterating the columns (or Series), not the rows as you might expect. You can access a row iterator by df.iterrows()
, which looks like
for row_index, row in df.iterrows():
# row is a pd.Series, which is like a vector / array / tuple
Within the loop you need to "pull out" the route
, then use that route
to "look up" the fare in the other DataFrame.
for row_index, row in df.iterrows():
route = row["Route"]
# find other rows that match this Route
other_rows = df_other[df_other["Route"]==route]
# if there isn"t exactly only one row, skip
if len(other_rows) != 1:
continue
# this is how we can set a value in a dataframe
df.loc[row_index, "Fares"] = other_rows.iloc[0]["fares"]
Having said all this, we wouldn't normally treat a DataFrame as a list of rows to iterate. Think of it as a database table and try for set-based operations.
Here's how I would do this:
# index these so we can compare corresponding rows
df = df.set_index("Route")
df_other = df_other.set_index("Route")
# combine first performs a "if null" kind of coalescing
combined = df.combine_first(df_other)
# the index ensures that we are updating rows correctly
df["Fares"] = combined["Fares"]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.