I have a DataFrame of the format:
ID Theme Operation Volume
100 Jungle S3 Full
200 Desert S3 Full
302 Cavern S1 Empty
303 Swamp nan Full
400 Jungle S3 nan
600 Desert nan Empty
Where I would like to write a script that iterates through the empty cells and reassigns them from 'nan', and replaces them with a variable NA_ where the _ is a count of how many missing variables they are. So my desired output would be:
ID Theme Operation Volume
100 Jungle S3 Full
200 Desert S3 Full
302 Cavern S1 Empty
303 Swamp NA1 Full
400 Jungle S3 NA3
600 Desert NA2 Empty
When I try to iterate over the df and identify the nan values, for some reason the following did not work.
count = 0
for col in df.colums:
for row in df[col]:
if row == float('nan'):
row = 'NA{}'.format(count)
count += 1
Any ideas why? Or is there a better way to do this that I'm struggling to see?
Thanks:)
Concatenate your columns, replace NaN by NA_ (_ is replaced by num
) and split your columns. Finally override modified columns to your original dataframe:
tmp = df.reset_index().melt(id_vars='index', value_vars=['Operation', 'Volume'])
num = (tmp['value'].isna().cumsum()).astype(int)
tmp['value'] = tmp['value'].fillna('NA' + num.astype(str))
tmp = tmp.pivot(index='index', columns='variable', values='value')
df[tmp.columns] = tmp
>>> df
ID Theme Operation Volume
0 100 Jungle S3 Full
1 200 Desert S3 Full
2 302 Cavern S1 Empty
3 303 Swamp NA1 Full
4 400 Jungle S3 NA3
5 600 Desert NA2 Empty
a little difficult, but not impossible.
What's important is to create a hierarchy when sorting column
--> index
to create a cumulative sum per column based on whether the value is NA. Basically you don't want Volume NA values to be counted before Operation.
s = df.stack(dropna=False).reset_index()
s['level_1'] = pd.Categorical(s['level_1'],categories=df.columns.tolist())
s1 = s.sort_values(by=['level_1','level_0']).set_index(['level_0','level_1']
).isna().cumsum().unstack(1).droplevel(0,1)
df = df.fillna('NA_' + s1.astype(str))
ID Theme Operation Volume
0 100 Jungle S3 Full
1 200 Desert S3 Full
2 302 Cavern S1 Empty
3 303 Swamp NA_1 Full
4 400 Jungle S3 NA_3
5 600 Desert NA_2 Empty
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.