简体   繁体   English

用 dataframe 重写 sql 查询; 如何混合来自 select 的不同来源

[英]Rewriting the sql query with dataframe; how to mix different sources from select

I need to rewrite sql query with several joins to python using pandas and dataframes.我需要使用 pandas 和数据帧来重写 sql 查询,并通过几个连接到 python 。 I have no problem making joins, but it gets complicated when I want to refer to data that is not strictly from join.我可以毫无问题地进行连接,但是当我想引用并非严格来自连接的数据时,它会变得复杂。

SQL to rewrite: SQL 重写:

SELECT 
    table1.id,
    field1,
    field2,
    count(DISTINCT x.apple, x.grape) AS fruits,
    min(x.time) min_value
FROM table1
    JOIN table2 x using(id)
    LEFT JOIN table3 using(id)
GROUP BY 1,2,3

My current code:我当前的代码:

mydf, df2, df3 = ...
mydf.merge(df2, on=['id'], how='inner')
mydf.merge(df3, on['id'], how='left')
mydf = mydf[['id', 'field1', 'field2']] # problem, missing fruits, min_value and id should be table1.id
mydf.groupby('id', 'field1', 'field2')

I know how to join them, but I don't know how to then create a dataframe that will contain the required elements from the select like eg table1.id or min(x.time) min_value我知道如何加入它们,但我不知道如何创建一个 dataframe ,它将包含 select 中所需的元素,例如table1.idmin(x.time) min_value

Tables

Table1 columns: [id, field1, field2, field1_2, field1_3]
Table2 columns: [id, field1, field2, apple, grape, time, field2_1, field2_2]
Table3 columns: [id, field1, field2, field3_1, field3_2]

EDIT : I added sample tables, corrected the error in line 4 of my code, and added a summary.编辑:我添加了示例表,更正了代码第 4 行中的错误,并添加了摘要。

try this:尝试这个:

    mydf = mydf[['id', 'table1', 'table2']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在熊猫数据框中按不同来源分组并求和? - How to group by and make sum from different sources in a pandas dataframe? pandas DataFrame 如何混合不同比例的条形图和折线图 - pandas DataFrame how to mix bar and line plots with different scales 如何同时从多个不同的来源读取? - How to read from multiple different sources concurrently? SQL 查询如何计算两个不同的 SQL 查询的结果? - SQL Query how to calculate the result from two different SQL query? 如何从Pandas Dataframe中的不同点选择行 - How to select rows from different point in pandas Dataframe 如何压缩代码“ pymssql查询数据从sql到dataframe”? - How to compact code 'pymssql query data from sql to dataframe'? SAS 用户需要帮助吗? 如何在单个 SQL 语句中连接来自不同数据库源的 R 或 Python 中的表? - SAS user needs help! How do you join tables in R or Python from different database sources in a single SQL statement? Pyspark:如何从 pyspark.sql.dataframe.DataFrame 中选择唯一的 ID 数据? - Pyspark: how to select unique ID data from a pyspark.sql.dataframe.DataFrame? Python - 如何使用 DataFrame.query() 过滤包含 int 和 string 的列混合? - Python - How to filter column mix with int and string using DataFrame.query()? 从不同来源创建 dict - Creating Dict from different sources
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM