简体   繁体   English

如何创建一个 pandas 系列(列),基于与另一个 Dataframe 中的值的匹配?

[英]How to create a pandas Series (column), based in a match with a value in another Dataframe?

my question is the following: I do not know very well all the pandas methods and I think that there is surely a more efficient way to do this: I have to load two tables from.csv files to a postgres database;我的问题如下:我不太了解所有 pandas 方法,我认为肯定有更有效的方法:我必须将两个表从 .csv 文件加载到 postgres 数据库; These tables are related to each other with an id, which serves as a foreign key, and comes from the source data, however I must relate them to a different id controlled by my logic.这些表通过一个 id 相互关联,该 id 作为外键并来自源数据,但是我必须将它们与我的逻辑控制的不同 id 相关联。

I explain graphically in the following image:我在下图中以图形方式解释:

在此处输入图像描述

Im trying to create a new Series based on the "another_id" that i have and apply a function that loop through a dataframe Series to compare if have the another code and get their id我正在尝试根据我拥有的“another_id”创建一个新系列,并应用一个循环遍历 dataframe 系列的 function 来比较是否有另一个代码并获取它们的 id

def check_foreign_key(id, df_ppal):
  if id:
    for i in df_ppal.index:
      if id == df_ppal.iloc[i]['another_id']:
        return df_ppal.iloc[i]['id']

dfs['id_fk'] = dfs['another_id'].apply(lambda id : check_foreign_key(id, df_ppal))

In this point i think that it is not efficient because I have to loop in all column to match the another_id and get and get its the correct id that I need is in yellow in the picture.在这一点上,我认为它效率不高,因为我必须在所有列中循环以匹配 another_id 并获取并获取我需要的正确 ID 在图片中为黄色。

So I should think about search algorithms to make the task more efficient, but I wonder if pandas does not have a method that allows me to do this faster, in case there are many records.所以我应该考虑搜索算法以使任务更有效率,但我想知道 pandas 是否没有一种方法可以让我更快地执行此操作,以防有很多记录。

I need a dataframe like a this table that have a new column "ID Principal" based on matching Another_code, with another dataframe column.我需要一个像这张表一样的 dataframe,它有一个基于匹配 Another_code 的新列“ID Principal”,以及另一个 dataframe 列。

ID ID ID Principal身份证校长 Another_code另一个代码
1 1个 12 12 54 54
2 2个 12 12 54 54
3 3个 13 13 55 55
4 4个 14 14 56 56
5 5个 14 14 56 56
6 6个 14 14 56 56

Well indeed, I was not understanding very well all the pandas functions, I could solve my problem using merge, I did not know that pandas had a good implementation of the typical Join in SQL.确实,我不是很了解 pandas 的所有功能,我可以使用合并解决我的问题,我不知道 pandas 在 SQL 中很好地实现了典型的 Join。

This documentation helped me a lot:该文档对我帮助很大:

  1. https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#database-style-dataframe-or-named-series-joining-merging https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#database-style-dataframe-or-named-series-joining-merging

  2. Pandas Merging 101 Pandas合并101

Finally my answer:最后是我的回答:

new_df = principal.merge(secondary, on='another_id')

I thank you all!我谢谢大家!

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何基于另一个DataFrame中的列在Pandas DataFrame中创建新列? - How to create a new column in a Pandas DataFrame based on a column in another DataFrame? Pandas - 检查另一个数据框列中的系列值 - Pandas - check for series value in another dataframe column 如何根据另一个 dataframe 的匹配为 dataframe 的新列添加值? - how to add value to a new column to a dataframe based on the match of another dataframe? 如何根据 Pandas 数据框中的另一列值添加列? - How to add column based on another column value in Pandas dataframe? 如何根据来自另一个系列的条件在 Pandas 系列中创建新列 - How to create a new column in a Pandas series based off of conditions derived from another series 根据熊猫中的行匹配,用另一个DataFrame中的值有条件地填充列 - Conditionally fill column with value from another DataFrame based on row match in Pandas Python Pandas DataFrame - 如何根据另一列(日期类型)中的部分匹配对 1 列中的值求和? - Python Pandas DataFrame - How to sum values in 1 column based on partial match in another column (date type)? 如何根据行中的另一个值在 dataframe 中创建列(Python) - How to create a column in a dataframe based on another value in the row (Python) Pandas:在 dataframe 中创建列,并通过查看另一个 dataframe 为该列分配值 - Pandas: Create column in dataframe and assign value to the column by looking into another dataframe pandas dataframe 中给定子索引根据另一列中的最大值创建新列 - Create a new column based on the maximum value in another column for a given sub index in pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM