簡體   English   中英

Python Pandas 非等式連接

[英]Python Pandas non equal join

有桌

import pandas as pd
import numpy as np

list_1=[['Steven',np.nan,'C1'],
        ['Michael',np.nan,'C2'],
        ['Robert',np.nan,'C3'],
        ['Buchanan',np.nan,'C1'],
        ['Suyama',np.nan,'C2'],
        ['King',np.nan,'C3']]
labels=['first_name','last_name','class']
df=pd.DataFrame(list_1,columns=labels)
df

出去

    first_name  last_name   class
0   Steven       NaN         C1
1   Michael      NaN         C2
2   Robert       NaN         C3
3   Buchanan     NaN         C1
4   Suyama       NaN         C2
5   King         NaN         C3

需要:

first_name  last_name
Steven       Buchanan
Michael      Suyama
Robert       King

所以我需要進行非等連接等效 SQL 查詢:

;with cte as
(
SELECT first_name,
        class,
        ROW_NUMBER() OVER (partition by class ORDER BY first_name) as rn
FROM students
)
select c_fn.first_name,
        c_ln.first_name
from cte c_fn join cte c_ln on c_fn.class=c_ln.class and c_ln.rn< c_fn.rn

或作為 SQL 查詢:

;with cte as
(
SELECT first_name,
        last_name,
        ROW_NUMBER() OVER ( ORDER BY (select null)) as rn
FROM students
)
select fn.first_name,
        ln.first_name as last_name
from cte fn join cte ln on ln.rn=fn.rn+3

PANDAS 中的問題是非等價自聯接不能用 MERGE 來完成。 而且我找不到其他方法......

我們可以通過使用groupbyagg並連接字符串以更智能的方式在 Pandas 中解決這個問題。 然后我們split它們split為列:

dfn = df.groupby('class').agg(' '.join)['first_name'].str.split(' ', expand=True)
dfn.columns = [df.columns[:2]]
dfn = dfn.reset_index(drop=True)

  first_name last_name
0     Steven  Buchanan
1    Michael    Suyama
2     Robert      King

您可以將索引設置為 'class' 並選擇各個名稱:

df = df.setIndex('class')
first_name = df.loc["C1", "first_name"].values[0]
last_name = df.loc["C1", "last_name"].values[1]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM