I have a dataset that contains name and date. And i need to compare them to others datasets that have name and date, and call another function if the name is in it, in the example i just mocked a return, that would be assigned to a new column in the dataframe. But i couldn't find how. Here's what i did so far: *I need to use numpy vectorization
def getName(name, date, df1, df2):
if name == df1['NAME'].values:
return name
if name == df2['NAME'].values:
return 'HEY'
df = pd.DataFrame({
"NAME": ["JOE", "CHRIS", "AARON"],
"DATE": [10, 20, 30]
})
df1 = pd.DataFrame({
"NAME": ["JOE", "JASON", "GUS"],
"DATE": [10, 20, 30]
})
df2 = pd.DataFrame({
"NAME": ["STEPHEN", "CHRIS", "AARON"],
"DATE": [10, 20, 30]
})
df['NAME_'] = getname(df['NAME'].values, df['DATE'].values, df1, df2)
The output should be:
df =
NAME DATE NAME_
JOE 10 JOE
CHRIS 20 HEY
AARON 30 HEY
So you are testing equality with the ==
operator, which will evaluate False because name
is a str
and df1['NAME']
is a Series
. I think you want to test if name
is in a column. You can do this with a construct like if name in df1['NAME'].values
.
But, even if you fix the function, you can't call getName
just once and get the result you are looking for. Typically, you could use apply
so the function is called for every row of df
. You can do this with df['NAME'].apply(getname, axis=1)
. But this isn't using vectorization, as apply
is a loop behind the scenes.
So perhaps you could use join
df1['NAME_'] = df1['NAME']
df2['NAME_'] = 'HEY'
df3 = pd.concat([df2, df3]).set_index('NAME')
df.join(df3['NAME_'], on='NAME', how='left')
Output
NAME DATE NAME_
0 JOE 10 JOE
1 CHRIS 20 HEY
2 AARON 30 HEY
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.