如果一个数据框的行值在另一数据框的列中，则创建一个新列并获取该索引

Question

I may be overcomplicating this problem, however I can't seem to find a simple solution. 我可能使这个问题复杂化了，但是似乎找不到简单的解决方案。

I have two DataFrame's. 我有两个DataFrame。 Let's call them df1 and df2. 我们称它们为df1和df2。 To keep things simple. 为了使事情简单。 Let's say df1 has one column called "Some Data" and df2 has two columns called "some data" and "other data". 假设df1有一个列称为“某些数据”，而df2有两列称为“某些数据”和“其他数据”。

Example: 例：

df1 DF1

Some Data "Lebron James 123" "Lebron James 234"

df2 DF2

some data                        other data
"Lebron James 123 + other text"  "I want this in df1["New?"]"
"Michael Jordan"                 "Doesn't Matter"

So basically I want to create a new column in df1 called "New?". 因此，基本上我想在df1中创建一个名为“ New？”的新列。 This new column (in df1) will say "New" if df1["Some data"] is in df2["Some other data"]. 如果df1 [“ Some data”]在df2 [“ Some other data”]中，则此新列（在df1中）将显示“ New”。 However, if there is no instance in df2["some data"], then I set the df1["New?"] to that specific row's value in df2["other data"]. 但是，如果df2 [“ some data”]中没有实例，则将df1 [“ New？”]设置为df2 [“ other data”]中该特定行的值。

Desired result after running: 运行后所需的结果：

df1 DF1

Some Data                         New?
"Lebron James 123"  "I want this in df1["New?"]"
"Lebron James 234"               "New"

So as you can see The New? 如您所见，The New？ column would include that specific row's value from the other data column. 列将包含来自另一数据列的特定行的值。 Lebron James 234 isn't anywhere in some data in df2 so it says new. Lebron James 234在df2的某些数据中并不存在，因此它是全新的。

I am able to get it to say True or False using the .isin() method, however don't know how to grab the index of the other df and get the value from the other data column. 我可以使用.isin()方法让它说是对还是.isin() ，但是不知道如何获取另一个df的索引并从另一个数据列获取值。

Thank you 谢谢

EDIT: 编辑：

From what I know will work 据我所知会起作用

df["New?"] = df1["Some Data"].isin(df2["some data"])

Would render 会渲染

df1["New?"] DF1 [ “新？”]

True
False

So I want True to be the "I want this in df1["New?"]" and False to be New 因此，我希望True成为“我想要df1 [“ New？”]]中的内容，而False成为New

Answer 1

First create a regular expression by joining your df1 series: 首先通过加入df1系列创建一个正则表达式：

rgx = '|'.join(df1['some data'])

Now using np.where : 现在使用np.where ：

df1.assign(data=np.where(df2['some data'].str.match(rgx), df2['other data'], 'New'))

          some data                        data
0  Lebron James 123  I want this in df1["New?"]
1  Lebron James 234                         New

An example with mismatching shapes: 形状不匹配的示例：

df1 = pd.DataFrame({'a': ['a', 'b', 'c', 'd']})
df2 = pd.DataFrame({'a': ['aaaaa', 'bbbb', 'ffff', 'gggg', 'hhhh']})

rgx = '({})'.format('|'.join(df1.a))
m = df2.assign(flag=df2.a.str.extract(rgx))

df1.set_index('a').join(m.set_index('flag')).fillna('New').reset_index()

  index      a
0     a  aaaaa
1     b   bbbb
2     c    New
3     d    New

Answer 2

Based on your info, seems like you need only a simple np.where (if dfs have same length) 根据您的信息，似乎您只需要一个简单的np.where （如果dfs具有相同的长度）

df1['New?'] = np.where(df1["Some Data"].isin(df2["some data"]), df2['other data'], 'New')

    Some Data                       New?
0   Lebron James 123 + other text   I want this in df1[New?"]"
1   Lebron James 234                New

For different length, 对于不同的长度，

mask = df2["some data"].isin(df["Some Data"]).values
df.loc[mask,'New'] = df2.loc[mask, 'other data']

df.fillna('New')

Explanation 说明

Basically you have a mask, and you use the same mask to filter both data frames. 基本上，您有一个掩码，并且使用相同的掩码来过滤两个数据帧。 This yields the same number of results on both dfs given the descriptions, and you assign the filtered rows' "other data" values from df2 to the same matching rows in df "some data" 这产生相同数量的上两个结果的dfs给出的描述，并从分配过滤的行‘其他数据’的值df2在同一匹配行df ‘一些数据’

如果一个数据框的行值在另一数据框的列中，则创建一个新列并获取该索引

问题描述

2 个解决方案

解决方案1
1 2018-09-07 17:52:30

解决方案2
1 2018-09-07 17:54:40

如果一个数据框的行值在另一数据框的列中，则创建一个新列并获取该索引

问题描述

2 个解决方案

解决方案1 1 2018-09-07 17:52:30

解决方案2 1 2018-09-07 17:54:40

解决方案1
1 2018-09-07 17:52:30

解决方案2
1 2018-09-07 17:54:40