I have a dataframe where I want to replace values in a column, but the dict describing the replacement is based on values in another column. A sample dataframe would look like this:
Map me strings date
0 1 test1 2020-01-01
1 2 test2 2020-02-10
2 3 test3 2020-01-01
3 4 test2 2020-03-15
I have a dictionary that looks like this:
map_dict = {'2020-01-01': {1: 4, 2: 3, 3: 1, 4: 2},
'2020-02-10': {1: 3, 2: 4, 3: 1, 4: 2},
'2020-03-15': {1: 3, 2: 2, 3: 1, 4: 4}}
Where I want the mapping logic to be different based on the date.
In this example, the expected output would be:
Map me strings date
0 4 test1 2020-01-01
1 4 test2 2020-02-10
2 1 test3 2020-01-01
3 4 test2 2020-03-15
I have a massive dataframe (100M+ rows) so I really want to avoid any looping solutions if at all possible.
I have tried to think of a way to use either map or replace but have been unsuccessful
Use DataFrame.join
with MultiIndex Series
created by DataFrame
cosntructor and DataFrame.stack
:
df = df.join(pd.DataFrame(map_dict).stack().rename('new'), on=['Map me','date'])
print (df)
Map me strings date new
0 1 test1 2020-01-01 4
1 2 test2 2020-02-10 4
2 3 test3 2020-01-01 1
3 4 test2 2020-03-15 4
也许尝试这样的事情?
df['mapped'] = df.apply(lambda x: map_dict[x['date']][x['Map me']], axis=1)
Try with np.where , which normally has better performance than pandas:
df["Mapped"] = ""
for key in map_dict.keys():
df["Mapped"] = np.where((df["date"] == key)&(df["Mapped"] == ""), df["Map me"].apply(lambda x: map_dict[key][x]), df["Mapped"])
Result:
Map me strings date Mapped
0 1 test1 2020-01-01 4
1 2 test2 2020-02-10 4
2 3 test3 2020-01-01 1
3 4 test2 2020-03-15 4
A more pandas-like way to this would be convert the map_dict
to a DataFrame
and join it to your sample frame. For example:
# Create the original dataframe
>>> df = pd.DataFrame([(1, 'test1', '2020-01-01'), (2, 'test2', '2020-02-10'), (3, 'test3', '2020-01-01'), (4, 'test2', '2020-03-15')], columns=['Map me', 'strings', 'date'])
>>> df
Map me strings date
0 1 test1 2020-01-01
1 2 test2 2020-02-10
2 3 test3 2020-01-01
3 4 test2 2020-03-15
# Convert the map dict to a dataframe
>>> map_df = pd.DataFrame([(k, j, l) for k, v in map_dict.items() for j,l in v.items()], columns=['date', 'Map me', 'Map to'])
>>> map_df
date Map me Map to
0 2020-01-01 1 4
1 2020-01-01 2 3
2 2020-01-01 3 1
3 2020-01-01 4 2
4 2020-02-10 1 3
5 2020-02-10 2 4
6 2020-02-10 3 1
7 2020-02-10 4 2
8 2020-03-15 1 3
9 2020-03-15 2 2
10 2020-03-15 3 1
11 2020-03-15 4 4
# Perform the join
>>> mapped_df = pd.merge(df, map_df, left_on=['date', 'Map me'], right_on=['date', 'Map me'])
>>> mapped_df
Map me strings date Map to
0 1 test1 2020-01-01 4
1 2 test2 2020-02-10 4
2 3 test3 2020-01-01 1
3 4 test2 2020-03-15 4
>>>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.