[英]how to use numpy vectorization for mutiple datasets, and then call a function?
I have a dataset that contains name and date.我有一个包含名称和日期的数据集。 And i need to compare them to others datasets that have name and date, and call another function if the name is in it, in the example i just mocked a return, that would be assigned to a new column in the dataframe.我需要将它们与具有名称和日期的其他数据集进行比较,如果名称在其中,则调用另一个 function,在示例中我只是模拟了一个返回,它将分配给 dataframe 中的一个新列。 But i couldn't find how.但我找不到如何。 Here's what i did so far: *I need to use numpy vectorization这是我到目前为止所做的:*我需要使用 numpy 矢量化
def getName(name, date, df1, df2):
if name == df1['NAME'].values:
return name
if name == df2['NAME'].values:
return 'HEY'
df = pd.DataFrame({
"NAME": ["JOE", "CHRIS", "AARON"],
"DATE": [10, 20, 30]
})
df1 = pd.DataFrame({
"NAME": ["JOE", "JASON", "GUS"],
"DATE": [10, 20, 30]
})
df2 = pd.DataFrame({
"NAME": ["STEPHEN", "CHRIS", "AARON"],
"DATE": [10, 20, 30]
})
df['NAME_'] = getname(df['NAME'].values, df['DATE'].values, df1, df2)
The output should be: output 应该是:
df =
NAME DATE NAME_
JOE 10 JOE
CHRIS 20 HEY
AARON 30 HEY
So you are testing equality with the ==
operator, which will evaluate False because name
is a str
and df1['NAME']
is a Series
.因此,您正在使用==
运算符测试相等性,这将评估 False 因为name
是str
而df1['NAME']
是Series
。 I think you want to test if name
is in a column.我认为您想测试name
是否在列中。 You can do this with a construct like if name in df1['NAME'].values
.您可以使用if name in df1['NAME'].values
类的构造来执行此操作。
But, even if you fix the function, you can't call getName
just once and get the result you are looking for.但是,即使您修复了 function,您也不能只调用一次getName
并获得您正在寻找的结果。 Typically, you could use apply
so the function is called for every row of df
.通常,您可以使用apply
以便为df
的每一行调用 function 。 You can do this with df['NAME'].apply(getname, axis=1)
.您可以使用df['NAME'].apply(getname, axis=1)
来做到这一点。 But this isn't using vectorization, as apply
is a loop behind the scenes.但这没有使用矢量化,因为apply
是幕后的循环。
So perhaps you could use join
所以也许你可以使用join
df1['NAME_'] = df1['NAME']
df2['NAME_'] = 'HEY'
df3 = pd.concat([df2, df3]).set_index('NAME')
df.join(df3['NAME_'], on='NAME', how='left')
Output Output
NAME DATE NAME_
0 JOE 10 JOE
1 CHRIS 20 HEY
2 AARON 30 HEY
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.