简体   繁体   English

如何在 MySQL 中复制 pandas function?

[英]How do I replicate a pandas function in MySQL?

Im new to SQL and trying to unlearn what I know in python.我是 SQL 的新手,并试图忘记我在 python 中所知道的。 I have a script where I connect to the odbc of SSMS to work with data in Python:我有一个脚本,我连接到 SSMS 的 odbc 以处理 Python 中的数据:

import pyodbc
import pandas as pd
#odbc
conn = pyodbc.connect('Driver={SQL Server};'
                      'Server=PMZZ315\RION;'
                      'Database=Warehouse;'
                      'Trusted_Connection=yes;')

cursor = conn.cursor()

df = pd.read_sql_query("SELECT [LetId],[StreetAddressLine1],[CompanyName] FROM Dim.Let", conn)
df

df.head()
#print(df.columns)


# Select duplicate rows except first occurrence based on all columns
duplicateRowsDF = df[df.duplicated(['CompanyName','StreetAddressLine1'])]

#print("Duplicate Rows except first occurrence based on all columns are :")
print(duplicateRowsDF)
duplicateRowsDF.to_csv("duplicateRowsDFodbc.csv")

What function in SQL can substitute the df.duplicated function? SQL 中的 function 可以替代 df.duplicated function? All I am trying to do is detect duplicate records ignoring the first instance if the company name and street address are repeated我要做的就是检测重复记录,如果公司名称和街道地址重复,则忽略第一个实例

Reprex of output dataset: output 数据集的表示:

LetId   StreetAddressLine1           CompanyName
32  1451 West Brimson View Court    Palmer 
405 1808 North Lonion Ave           Ozark 
465 4223 Monty Hwy              Alabama 

SQL tables represent unordered sets. SQL 表表示无序集。 Ordering is only provided by columns in the data.排序仅由数据中的列提供。 There is no "first" without an ordering.没有排序就没有“第一”。 Let me assume that letid defines the ordering.让我假设letid定义了排序。

The canonical way in SQL uses row_number() : SQL 中的规范方式使用row_number()

select t.*
from (select t.*,
             row_number() over (partition by CompanyName, StreetAddressLine1 order by letid) as seqnum
      from t
     ) t
where seqnum = 1;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM