Pandas is giving a weird output when using a dictionary to replace values within a dataframe:
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
Course
English 21st Century
Maths in the Golden Age of History
Science is cool
Mapped_Items = ['Math', 'English', 'Science', 'History']
pat = '|'.join(r"\b{}\b".format(x) for x in Mapped_Items)
df['Interest'] = df['Course].str.findall('('+ pat + ')').str.join(', ')
mapped_dict = {'English' : 'Eng', 'Science' : 'Sci', 'Math' : 'Mat', 'History' : 'Hist'}
df['Interest'] = df1['Interest'].replace(mapped_dict, inplace=False)
What I get:
print(df)
df
Course Interest
English 21st Century Engg
Maths in the Golden Age of History MatttHistt
Science is cool Scii
What I'm after is something close to the following :
Course Interests
English 21st Century Eng
Maths in the Golden Age of History Mat, Hist
Science is cool Sci
Your logic seems overcomplicated. You don't need regex, and pd.Series.replace
is inefficient with a dictionary, even if it could work on a series of lists. Here's an alternative method:
import pandas as pd
from io import StringIO
mystr = StringIO("""Course
English 21st Century
Maths in the Golden Age of History
Science is cool""")
df = pd.read_csv(mystr)
d = {'English' : 'Eng', 'Science' : 'Sci', 'Math' : 'Mat', 'History' : 'Hist'}
df['Interest'] = df['Course'].apply(lambda x: ', '.join([d[i] for i in d if i in x]))
print(df)
Course Interest
0 English 21st Century Eng
1 Maths in the Golden Age of History Mat, Hist
2 Science is cool Sci
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.