简体   繁体   English

如果一个数据框的行值在另一数据框的列中,则创建一个新列并获取该索引

[英]Create a new column if one dataframe's row value is in another data frame's column and get that index

I may be overcomplicating this problem, however I can't seem to find a simple solution. 我可能使这个问题复杂化了,但是似乎找不到简单的解决方案。

I have two DataFrame's. 我有两个DataFrame。 Let's call them df1 and df2. 我们称它们为df1和df2。 To keep things simple. 为了使事情简单。 Let's say df1 has one column called "Some Data" and df2 has two columns called "some data" and "other data". 假设df1有一个列称为“某些数据”,而df2有两列称为“某些数据”和“其他数据”。

Example: 例:

df1 DF1

Some Data "Lebron James 123" "Lebron James 234"

df2 DF2

some data                        other data
"Lebron James 123 + other text"  "I want this in df1["New?"]"
"Michael Jordan"                 "Doesn't Matter"

So basically I want to create a new column in df1 called "New?". 因此,基本上我想在df1中创建一个名为“ New?”的新列。 This new column (in df1) will say "New" if df1["Some data"] is in df2["Some other data"]. 如果df1 [“ Some data”]在df2 [“ Some other data”]中,则此新列(在df1中)将显示“ New”。 However, if there is no instance in df2["some data"], then I set the df1["New?"] to that specific row's value in df2["other data"]. 但是,如果df2 [“ some data”]中没有实例,则将df1 [“ New?”]设置为df2 [“ other data”]中该特定行的值。

Desired result after running: 运行后所需的结果:

df1 DF1

Some Data                         New?
"Lebron James 123"  "I want this in df1["New?"]"
"Lebron James 234"               "New"

So as you can see The New? 如您所见,The New? column would include that specific row's value from the other data column. 列将包含来自另一数据列的特定行的值。 Lebron James 234 isn't anywhere in some data in df2 so it says new. Lebron James 234在df2的某些数据中并不存在,因此它是全新的。

I am able to get it to say True or False using the .isin() method, however don't know how to grab the index of the other df and get the value from the other data column. 我可以使用.isin()方法让它说是对还是.isin() ,但是不知道如何获取另一个df的索引并从另一个数据列获取值。

Thank you 谢谢

EDIT: 编辑:

From what I know will work 据我所知会起作用

df["New?"] = df1["Some Data"].isin(df2["some data"])

Would render 会渲染

df1["New?"] DF1 [ “新?”]

True
False

So I want True to be the "I want this in df1["New?"]" and False to be New 因此,我希望True成为“我想要df1 [“ New?”]]中的内容,而False成为New

First create a regular expression by joining your df1 series: 首先通过加入df1系列创建一个正则表达式:

rgx = '|'.join(df1['some data'])

Now using np.where : 现在使用np.where

df1.assign(data=np.where(df2['some data'].str.match(rgx), df2['other data'], 'New'))

          some data                        data
0  Lebron James 123  I want this in df1["New?"]
1  Lebron James 234                         New

An example with mismatching shapes: 形状不匹配的示例:

df1 = pd.DataFrame({'a': ['a', 'b', 'c', 'd']})
df2 = pd.DataFrame({'a': ['aaaaa', 'bbbb', 'ffff', 'gggg', 'hhhh']})

rgx = '({})'.format('|'.join(df1.a))
m = df2.assign(flag=df2.a.str.extract(rgx))

df1.set_index('a').join(m.set_index('flag')).fillna('New').reset_index()

  index      a
0     a  aaaaa
1     b   bbbb
2     c    New
3     d    New

Based on your info, seems like you need only a simple np.where (if dfs have same length) 根据您的信息,似乎您只需要一个简单的np.where (如果dfs具有相同的长度)

df1['New?'] = np.where(df1["Some Data"].isin(df2["some data"]), df2['other data'], 'New')

    Some Data                       New?
0   Lebron James 123 + other text   I want this in df1[New?"]"
1   Lebron James 234                New

For different length, 对于不同的长度,

mask = df2["some data"].isin(df["Some Data"]).values
df.loc[mask,'New'] = df2.loc[mask, 'other data']

df.fillna('New')

Explanation 说明

Basically you have a mask, and you use the same mask to filter both data frames. 基本上,您有一个掩码,并且使用相同的掩码来过滤两个数据帧。 This yields the same number of results on both dfs given the descriptions, and you assign the filtered rows' "other data" values from df2 to the same matching rows in df "some data" 这产生相同数量的上两个结果的dfs给出的描述,并从分配过滤的行‘其他数据’的值df2在同一匹配行df ‘一些数据’

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python / pandas:创建数据框的列并根据在另一个 dataframe 范围内找到列值来设置其值 - Python / pandas: create a data frame's column and set it's value based on finding a column value in range of another dataframe 根据数据框中另一列的值创建新列 - Create new column based on a value of another column in a data-frame Pandas:根据另一个数据框的列创建列 - Pandas: Create a column as a function of another data frame's column 如何将数据框的值复制到另一个数据框的最后一列/行 - How to copy value of dataframe to another dataframe's last column/row 通过将另一个数据框与一对多关系进行匹配来创建新列的数据框 - Create a new column's dataframe by matching another dataframe many to one relationship 当在给定行中找到字典值时,使用字典键作为行值的新DataFrame列 - New DataFrame column using the key of a dictionary as row value when one of it's values is found in a given row 创建一个数据框作为行索引值和列名的函数? - create a Data frame as a function of row index value and column name? 如何将一列除以另一列,其中一个数据帧的列值对应于 Python Pandas 中另一个数据帧的列值? - How to divide one column by another where one dataframe's column value corresponds to another dataframe's column's value in Python Pandas? 从分组数据框中的行值创建新列? - Create new column from a row value in a grouped data frame? 等效于将新行/列中的值添加到numpy中,就像R的data.frame - Equivalent of adding a value in a new row/column to numpy that works like R's data.frame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM