简体   繁体   English

在一个 dataframe 的一列中从另一个 dataframe 的另一列中查找字符串

[英]Find strings in a column of one dataframe from another column in a different dataframe

I'm trying to create a mapping file.我正在尝试创建一个映射文件。 The main issue is to compare two dataframes by using one column, then return a file of all matchine strings in both dataframes alongside some columns from the dataframes.主要问题是通过使用一列来比较两个数据帧,然后返回两个数据帧中所有匹配字符串的文件以及数据帧中的一些列。

Example data示例数据

df1 = pd.DataFrame({
    'Artist':
    ['50 Cent', 'Ed Sheeran', 'Celine Dion', '2 Chainz', 'Kendrick Lamar'],
    'album':
    ['Get Rich or Die Tryin', '+', 'Courage', 'So Help Me God!', 'DAMN'],
    'album_id': ['sdf34', '34tge', '34tgr', '34erg', '779uyj']
})

df2 = pd.DataFrame({
    'Artist': ['Beyonce', 'Ed Sheeran', '2 Chainz', 'Kendrick Lamar', 'Jay-Z'],
    'Artist_ID': ['frd345', '3te43', '32fh5', '235he', '345fgrt6']
})

So the main idea is to create a function that provides a mapping file that will take an item in artist name column from df1 and then check df2 artist name column to see if there are any similarities then create a mapping dataframe which contains the similar artist column, the album_id and the artist_id.所以主要的想法是创建一个 function 来提供一个映射文件,该文件将从 df1 的艺术家姓名列中获取一个项目,然后检查 df2 艺术家姓名列以查看是否有任何相似之处,然后创建一个包含相似艺术家列的映射 dataframe ,album_id 和 artist_id。

I tried the code below but I'm new to python so I got lost in the function. I would appreciate some help on a new function or a build up on what I was trying to do.我尝试了下面的代码,但我是 python 的新手,所以我在 function 中迷路了。我希望能在新的 function 上获得一些帮助,或者在我尝试做的事情上有所建树。 Thanks!谢谢!

Code I failed to build:我未能构建的代码:

def get_mapping_file(df1, df2):
# I don't know what I'm doing :'D
    for i in df2['Artist']:
        if i == df1['Artist'].any():
            name = i
            df1_id = df1.loc[df1['Artist'] == name, ['album_id']]
            id_to_use = df1_id.album_id[0]
            df2.loc[df2['Artist'] == i, 'Artist_ID'] = id_to_use
    return df2

The desired output is:所需的 output 是:

Artist艺术家 Artist_ID艺术家_ID album_id专辑编号
Ed Sheeran艾德·希兰 3te43 3te43 34tge 34tge
2 Chainz双链大师 32fh5 32fh5 34erg 34erg
Kendrick Lamar肯德里克·拉马尔 235he 235he 779uyj 779uyj

I am not sure if this is actually what you need, but your desired output is an inner join between the two dataframes:我不确定这是否真的是您所需要的,但是您想要的 output 是两个数据帧之间的内部连接:

pd.merge(df1, df2, on='Artist', how='inner')

This will give you the rows for Artists present in both dataframes.这将为您提供两个数据框中都存在的艺术家的行。

For me, it's easy to find that result.对我来说,很容易找到那个结果。 So you may do this:所以你可以这样做:

frame = df1.merge(df2, how='inner')

frame = frame.drop('album', axis=1)

and then you'll have your result.然后你就会得到你的结果。 Thanks !谢谢 !

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python:在DataFrame中,如何查找某一列中的字符串出现在另一列中的年份? - Python: In a DataFrame, how do I find the year that strings from one column appear in another column? 如何将一个数据框中的列表列与另一数据框中的字符串列连接在一起? - How to join a column of lists in one dataframe with a column of strings in another dataframe? 将一列中的数组除以 pandas 中不同 dataframe 中的另一列 - Divide the array in one column by another column from a different dataframe in pandas 检查一个数据帧中的一列字符串是否包含另一个数据帧中一列的子字符串,并输出其映射数据 - Check if a column of strings from one dataframe contains a substring from a column in another dataframe, and output its mapped data 如何从与 dataframe 的另一列的字符串匹配的列中删除字符串? - How to remove strings from a column matching with strings of another column of dataframe? 标记 Dataframe 中具有不同字符串的列是否存在于另一列中 - Flag if a column in Dataframe with different strings exist in another column 用单个 dataframe 中另一列的详细信息替换一列中的字符串 - Replacing strings in one column with the details of another column in a single dataframe 从多个列的另一个数据帧列中减去一个数据帧列 - Subtracting one dataframe column from another dataframe column for multiple columns 熊猫-检查一个数据帧中的字符串列是否包含来自另一个数据帧的一对字符串 - Pandas - check if a string column in one dataframe contains a pair of strings from another dataframe 从 DataFrame 中提取字符串,与上一列连接并与另一列 DataFrame 合并 - Extract strings from DataFrame, join with previous column and merge with another DataFrame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM