[英]Create a new column if one dataframe's row value is in another data frame's column and get that index
I may be overcomplicating this problem, however I can't seem to find a simple solution. 我可能使这个问题复杂化了,但是似乎找不到简单的解决方案。
I have two DataFrame's. 我有两个DataFrame。 Let's call them df1 and df2.
我们称它们为df1和df2。 To keep things simple.
为了使事情简单。 Let's say df1 has one column called "Some Data" and df2 has two columns called "some data" and "other data".
假设df1有一个列称为“某些数据”,而df2有两列称为“某些数据”和“其他数据”。
Example: 例:
df1 DF1
Some Data "Lebron James 123" "Lebron James 234"
df2 DF2
some data other data
"Lebron James 123 + other text" "I want this in df1["New?"]"
"Michael Jordan" "Doesn't Matter"
So basically I want to create a new column in df1 called "New?". 因此,基本上我想在df1中创建一个名为“ New?”的新列。 This new column (in df1) will say "New" if df1["Some data"] is in df2["Some other data"].
如果df1 [“ Some data”]在df2 [“ Some other data”]中,则此新列(在df1中)将显示“ New”。 However, if there is no instance in df2["some data"], then I set the df1["New?"] to that specific row's value in df2["other data"].
但是,如果df2 [“ some data”]中没有实例,则将df1 [“ New?”]设置为df2 [“ other data”]中该特定行的值。
Desired result after running: 运行后所需的结果:
df1 DF1
Some Data New?
"Lebron James 123" "I want this in df1["New?"]"
"Lebron James 234" "New"
So as you can see The New? 如您所见,The New? column would include that specific row's value from the other data column.
列将包含来自另一数据列的特定行的值。 Lebron James 234 isn't anywhere in some data in df2 so it says new.
Lebron James 234在df2的某些数据中并不存在,因此它是全新的。
I am able to get it to say True or False using the .isin()
method, however don't know how to grab the index of the other df and get the value from the other data column. 我可以使用
.isin()
方法让它说是对还是.isin()
,但是不知道如何获取另一个df的索引并从另一个数据列获取值。
Thank you 谢谢
EDIT: 编辑:
From what I know will work 据我所知会起作用
df["New?"] = df1["Some Data"].isin(df2["some data"])
Would render 会渲染
df1["New?"] DF1 [ “新?”]
True
False
So I want True to be the "I want this in df1["New?"]" and False to be New 因此,我希望True成为“我想要df1 [“ New?”]]中的内容,而False成为New
First create a regular expression by joining your df1
series: 首先通过加入
df1
系列创建一个正则表达式:
rgx = '|'.join(df1['some data'])
Now using np.where
: 现在使用
np.where
:
df1.assign(data=np.where(df2['some data'].str.match(rgx), df2['other data'], 'New'))
some data data
0 Lebron James 123 I want this in df1["New?"]
1 Lebron James 234 New
An example with mismatching shapes: 形状不匹配的示例:
df1 = pd.DataFrame({'a': ['a', 'b', 'c', 'd']})
df2 = pd.DataFrame({'a': ['aaaaa', 'bbbb', 'ffff', 'gggg', 'hhhh']})
rgx = '({})'.format('|'.join(df1.a))
m = df2.assign(flag=df2.a.str.extract(rgx))
df1.set_index('a').join(m.set_index('flag')).fillna('New').reset_index()
index a
0 a aaaaa
1 b bbbb
2 c New
3 d New
Based on your info, seems like you need only a simple np.where
(if dfs
have same length) 根据您的信息,似乎您只需要一个简单的
np.where
(如果dfs
具有相同的长度)
df1['New?'] = np.where(df1["Some Data"].isin(df2["some data"]), df2['other data'], 'New')
Some Data New?
0 Lebron James 123 + other text I want this in df1[New?"]"
1 Lebron James 234 New
For different length, 对于不同的长度,
mask = df2["some data"].isin(df["Some Data"]).values
df.loc[mask,'New'] = df2.loc[mask, 'other data']
df.fillna('New')
Explanation 说明
Basically you have a mask, and you use the same mask to filter both data frames. 基本上,您有一个掩码,并且使用相同的掩码来过滤两个数据帧。 This yields the same number of results on both
dfs
given the descriptions, and you assign the filtered rows' "other data" values from df2
to the same matching rows in df
"some data" 这产生相同数量的上两个结果的
dfs
给出的描述,并从分配过滤的行‘其他数据’的值df2
在同一匹配行df
‘一些数据’
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.