简体   繁体   中英

pandas: fill nans given a condition

I'm struggling with something that seemed to be trivial but apparently isn't. General picture: data - pandas dataframe - contains(among others) TOTAL_VISITS and NUM_PRINTS columns.

Objective: given num_prints parameter, find rows where NUM_prints = num_prints and fill nan s with a given number.

Where I stopped and it didn't make sense anymore:

indices= data['NUM_PRINTS'] == num_prints

data.loc[indices,'TOTAL_VISITS'].fillna(5,inplace=True)

This should work as much as I know and read. didn't fill nans with anything in practice, seemed like it worked with a copy or something as it didn't change anything in the original object.

What works:

data.loc[indices,'TOTAL_VISITS'] = 2

this does fill the column with 2's on the num_print condition, but does not consider nans.

data['TOTAL_VISITS'].fillna(0, inplace=True)

this does fill the nans in total visits with 0, but does not consider the num_prints condition.

I'm kinda hopeless as a regular for loop with .iloc and conditions takes way too long than I can handle.

I think need filter in both sides and apply fillna only for filtered rows:

np.random.seed(1213)

c = ['TOTAL_VISITS', 'A', 'NUM_PRINTS']
data = pd.DataFrame(np.random.choice([1,np.nan,3,4], size=(10,3)), columns=c)
print (data)
   TOTAL_VISITS    A  NUM_PRINTS
0           1.0  4.0         4.0
1           NaN  3.0         1.0
2           1.0  1.0         1.0
3           4.0  3.0         3.0
4           1.0  3.0         4.0
5           4.0  4.0         3.0
6           4.0  1.0         4.0
7           NaN  4.0         3.0
8           NaN  NaN         3.0
9           3.0  NaN         1.0


num_prints = 1
indices= data['NUM_PRINTS'] == num_prints
data.loc[indices,'TOTAL_VISITS'] = data.loc[indices,'TOTAL_VISITS'].fillna(100)
#alternative
#data.loc[indices,'TOTAL_VISITS'] = data['TOTAL_VISITS'].fillna(100)
print (data)
   TOTAL_VISITS    A  NUM_PRINTS
0           1.0  4.0         4.0
1         100.0  3.0         1.0
2           1.0  1.0         1.0
3           4.0  3.0         3.0
4           1.0  3.0         4.0
5           4.0  4.0         3.0
6           4.0  1.0         4.0
7           NaN  4.0         3.0
8           NaN  NaN         3.0
9           3.0  NaN         1.0

我认为这很好

data['TOTAL_VISITS'] = np.where(data['NUM_PRINTS'] == 1, 100, data['TOTAL_VISITS'])

One way to apply your logic, without using fillna is to define a mask including your criteria. Then update a series via .loc using this mask.

The below example works with data from @jezrael.

num_prints = 1

mask = (data['NUM_PRINTS'] == num_prints) & data['TOTAL_VISITS'].isnull()

data.loc[mask, 'TOTAL_VISITS'] = 100

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM