I'm new to Python, and I am having a lot of trouble joining two pandas data frames, because the merge should be based on a partial string match. More specifically:
I have a dataframe called df
that looks like this:
{ "writtenAt":"2015-01-01T18:31:01+00:00", "content":" India\’s banks will ramp up sales of bonds that act as capital buffers in 2015" }
where there are about 10,000 rows that looks like above.
Now, I have another dataframe called compNames
, which looks like this:
{ "ticker":"A", "name":"Agilent Technologies Inc.", "keyword":"Agilent" }
I have about 500 rows for the compNames
dataframe.
I am trying to assign a ticker value from compNames
to the matching entry of df
by the following mechanism:
check if any item from the entire column compNames['keyword']
is contained in an entry of df['content']
if there is a match, then return the matching word as a separate column of the df
dataframe (eg df['matchedName']
)
if there are multiple matches, then create a list of matching words to the corresponding entry of df['content']
Finally, join df
and compNames
by using df['matchedName']
and compNames['keyword']
as my key variables
What I have so far is:
# Load select company names
compNames = pd.read_csv("compNameList_LARA.txt")
compList = '|'.join(compNames['keyword'].tolist())
df['compMatch'] = df.content.str.contains(compList)
# drop unmatched articles
df = df[df['compMatch']==True]
# assign firm names
df['matchedName'] = df['content'].apply(lambda x: [x for x in compNames['keyword'].tolist() if x in df['content']])
However, when I do this, I get an empty list for the df['matchedName']
Could you help me figure out what went wrong? Many many thanks!!
-Jin
Figured it out. I just needed to do:
df['content'] = df['content'].str.lower().str.split()
df['matchedName'] = df['content'].apply(lambda x: [item for item in x if item in compNames['keyword'].tolist()])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.