简体   繁体   English

如何在pydatatable中加入具有不同键列名称的两个数据框?

[英]How to join two dataframes with different key column names in pydatatable?

I have a X dataframe as,我有一个 X 数据框,

DT_X = dt.Frame({
    
    'date':['2020-09-01','2020-09-02','2020-09-03'],
    'temp':[35.3,32.9,43.2]
    
})
Out[4]: 
   | date        temp
-- + ----------  ----
 0 | 2020-09-01  35.3
 1 | 2020-09-02  32.9
 2 | 2020-09-03  43.2

[3 rows x 2 columns]

Another dataframe Y as,另一个数据框 Y 为,

DT_Y = dt.Frame({
    
    'stop_date' : ['2020-08-01','2020-09-01','2020-09-03','2020-09-07'],
    'is_arrested':[True,False,False,True]
    
})
Out[6]: 
   | stop_date   is_arrested
-- + ----------  -----------
 0 | 2020-08-01            1
 1 | 2020-09-01            0
 2 | 2020-09-03            0
 3 | 2020-09-07            1

[4 rows x 2 columns]

Now I would like to perform JOIN operation on X and Y, for that i'm supposed to assign a key on X dataframe as,现在我想对 X 和 Y 执行 JOIN 操作,为此我应该在 X 数据帧上分配一个键,

DT_X.key='date'
Out[8]: 
date       | temp
---------- + ----
2020-09-01 | 35.3
2020-09-02 | 32.9
2020-09-03 | 43.2

[3 rows x 2 columns]

Next I'm joining X and Y as ,接下来我将加入 X 和 Y 作为 ,

DT_Y[:,:,join(DT_X)]

Here it is throwing out an error as ,在这里它抛出一个错误,

In [9]: DT_Y[:,:,join(DT_X)]                                                                                                                                                                      
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-a3bc1690fb98> in <module>
----> 1 DT_Y[:,:,join(DT_X)]

ValueError: Key column `date` does not exist in the left Frame

Of course date is not existed in DT_Y, it has a column name as stop_date .当然日期在 DT_Y 中不存在,它的列名为stop_date

How to perform join operations in this scenario ??在这种情况下如何执行连接操作?? ie No match of column names.没有匹配的列名。

Note :注意

An work around for this is to change the column name of DT_Y as解决此问题的方法是将 DT_Y 的列名更改为

DT_Y.names = {'stop_date':'date'}

DT_Y[:,:,join(DT_X)]

The joined DT can be viewed as,加入的 DT 可以被视为,

Out[11]: 
   | date        is_arrested  temp
-- + ----------  -----------  ----
 0 | 2020-08-01            1  NA  
 1 | 2020-09-01            0  35.3
 2 | 2020-09-03            0  43.2
 3 | 2020-09-07            1  NA  

[4 rows x 3 columns]

Here is the expected output:这是预期的输出:

Out[13]: 
   | stop_date   is_arrested  temp
-- + ----------  -----------  ----
 0 | 2020-08-01            1  NA  
 1 | 2020-09-01            0  35.3
 2 | 2020-09-03            0  43.2
 3 | 2020-09-07            1  NA  

[4 rows x 3 columns]

Right now, join() only supports the same column names in the both frames, please refer to documentation for more details.目前, join()仅支持两个框架中的相同列名,请参阅文档了解更多详细信息。 However, there is an open issue to improve the join functionality/API.但是,有一个未解决的问题需要改进连接功能/API。

Meanwhile, if you prefer not to rename the columns you can do the following同时,如果您不想重命名列,您可以执行以下操作

DT_Y_date = DT_Y[:, {"date":f[0], "is_arrested":f[1]}]
DT_YX_joined = DT_Y_date[:, :, join(DT_X)]

Then, DT_YX_joined will have the data you are looking for然后, DT_YX_joined就会有你要找的数据

   | date        is_arrested  temp
-- + ----------  -----------  ----
 0 | 2020-08-01            1  NA  
 1 | 2020-09-01            0  35.3
 2 | 2020-09-03            0  43.2
 3 | 2020-09-07            1  NA 

You can even do a one-liner like你甚至可以做一个像

DT_YX_joined = DT_Y[:, {"date":f[0], "is_arrested":f[1]}][:, :, join(DT_X)]

but it may not be readable enough.但它可能不够可读。 Also note, that no data copies are created here, it is only the column name that changes.还要注意,这里没有创建数据副本,只是更改了列名。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在两个数据框中使用不同的主键和外键列名称在 pandas 中进行 vlookup? - How to do a vlookup in pandas with different primary key and foreign key column names in two dataframes? 如何在pydatatable中的数据框列上设置键? - How to set a key on dataframe column in pydatatable? 如何执行具有不同列名的两个数据帧之间的关联 - How to perform Correlation between two dataframes with different column names 如何垂直添加两个具有不同列名的数据框 - how to add two dataframes with different column names vertically 如何合并两个具有不同列名但行数相同的数据框? - How to merge two dataframes with different column names but same number of rows? 使用不同的列名连接 Pandas 中的两个数据框 - Joining Two Dataframes in Pandas With Different Column Names 如何在 Python Pandas 中合并两个数据帧,其中关键列名称不同,但想要从第二个数据帧中检索某些列? - How to merge two dataframes in Python Pandas, where key column names different, but want to retrieve SOME of the columns from second dataframe? 如何在 pandas 中连接两个具有相同值但具有不同列名的表 - How to join two tables with the same values but with different column names in pandas 连接两个 Pandas 数据帧,(合并)来自相同列名的值 - Join Two Pandas Dataframes, (Merging) Values from Identical Column Names 如何比较两个忽略列名的数据框? - How to compare two dataframes ignoring column names?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM