I want to create a new column labeled Region
which associates ranges of integers with Regions as displayed through a dictionary. However, there is a condition: the sub-segment needs to be Australia.
import pandas as pd
import numpy as np
df = pd.read_excel(r'/Users/Desktop/dictionary.xlsx')
mydict = {"NSW": range(1000,1209)}
if df['Sub-Segment'] == "Australia":
df['Region'] = df['Postal Code'].map(mydict)
The data frame looks like this :
Sub-Segment Postal Code
Australia 1001
Australia 1002
Australia 1209
Mexico 1004
The desired data frame is this:
Sub-Segment Postal Code Region
Australia 1001 NSW
Australia 1002 NSW
Australia 1209 NSW
Mexico 1004 Other
I tried the above and got the following error message:
The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
mydict={
"NSW": range(1000, 1210),
"NHL": range(2000, 2099),
}
def region(df):
if df['Sub-Segment'] == 'Australia':
result = [key for (key, value) in mydict.items() if df['Postal Code'] in value]
if result:
return result[0]
return 'Other'
df['Region'] = df.apply(lambda row: region(row), axis=1)
I'm not sure where you're going with the dictionary. Since if you use dictionary you'll have to have tons of keys, I will prefer using a function instead.
You can use pandas.DataFrame.loc
to get the rows you want and use pandas.DataFrame.apply
to apply the function.
import pandas as pd
import numpy as np
df=pd.read_excel(r'/Users/Desktop/dictionary.xlsx' )
def func(x):
if 1000<=x<=1209:
return 'NSW'
else:
# some other stuff maybe?
pass
df.loc[df['Sub-Segment']=="Australia",'Region']=df['PostalCode'].apply(func)
df = df.fillna('Other')
print(df)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.