简体   繁体   中英

Remap the values to other and give default value also

I have tabe i have to map with two values in NY,CAits Domestic, WT its OUTSIDE, and other than that its has to OVERSEAS

di = {"NY": "Domestic","CA": "Domestic","WT":"OUTSIDE"}

df.replace({'Territory': di})

How to give OVERSEAS in the above code. So by default it has(nothing in the dictionary) to OVERSEAS

Use Series.map which return missing values for no match values, so added Series.fillna for replace them to default value:

df = pd.DataFrame({'Territory':['NY','CA','WT','SK','DE']})
di = {"NY": "Domestic","CA": "Domestic","WT":"OUTSIDE"}
print (df)
  Territory
0        NY
1        CA
2        WT
3        SK
4        DE

df['Territory'] = df['Territory'].map(di).fillna('OVERSEAS')
print (df)
  Territory
0  Domestic
1  Domestic
2   OUTSIDE
3  OVERSEAS
4  OVERSEAS

While jezrael's answer works it is slower than needed because it has to first do the mapping and then go back and fill the missing elements. If we take instead take advantage of Python's built in dictionaries we can significantly improve performance.

There are a couple of approaches that take advantage of the flexibility of python's dictionary objects that you can use to create a default. One is to use the get method on the mapping dictionary and the other is to use the defaultdict object from collections . As mentioned above, the upside of the get and defaultdict methods is they avoid having to look back through the whole series after the mapping to replace the NAs and instead do it within the mapping step itself.

So, in short, I would suggest:

df = pd.DataFrame({'Territory':['NY','CA','WT','SK','DE']})
di = {"NY": "Domestic","CA": "Domestic","WT":"OUTSIDE"}
df['Territory'] = df['Territory'].map(lambda x: di.get(x, 'OVERSEAS'))

Some timings which back up the performance of this approach are:

df = pd.DataFrame({'Territory':['NY','CA','WT','SK','DE']})
di = {"NY": "Domestic","CA": "Domestic","WT":"OUTSIDE"}

%timeit df['Territory'].map(lambda x: di.get(x, 'OVERSEAS'))
>>> 138 µs ± 1.35 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

from collections import defaultdict
dd = defaultdict(lambda:'OVERSEAS')
dd.update(di)   
%timeit df['Territory'].map(di)
>>> 143 µs ± 2.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit df['Territory'] = df['Territory'].map(di).fillna('OVERSEAS')
>>> 657 µs ± 33.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

The difference in performance becomes even more obvious for larger dictionaries:

It is also interesting to note that just mapping a dict with missing terms seems to be slow in Pandas if you don't have a default.

%timeit df['Territory'].map(di)
>>> 372 µs ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM