简体   繁体   English

加入两个熊猫数据框

[英]Join two Pandas Dataframes

We have two tables: 我们有两个表:

Table 1: EventLog 表1:EventLog

class EventLog(Base):
    """"""

    __tablename__   = 'event_logs'

    id = Column(Integer, primary_key=True, autoincrement=True)

    # Keys
    event_id        = Column(Integer)
    data            = Column(String)
    signature       = Column(String)

    # Unique constraint
    __table_args__  = (UniqueConstraint('event_id', 'signature'),)

Table 2: Machine_Event_Logs 表2:Machine_Event_Logs

class Machine_Event_Logs(Base):
    """"""

    __tablename__   = 'machine_event_logs'

    id = Column(Integer, primary_key=True, autoincrement=True)

    # Keys
    machine_id      = Column(String, ForeignKey("machines.id"))
    event_log_id    = Column(String, ForeignKey("event_logs.id"))
    event_record_id = Column(Integer)
    time_created    = Column(String)

    # Unique constraint
    __table_args__ = (UniqueConstraint('machine_id', 'event_log_id', 'event_record_id', 'time_created'),)

    # Relationships
    event_logs      = relationship("EventLog")

The relationship between EventLogs and Machine_Event_Logs is 1 to many. EventLogsMachine_Event_Logs之间的关系是一对多。

Whereby we register a unique event log into the EventLogs table and then register millions of entries into Machine_Event_Logs for every time we encounter that event. 因此,我们在EventLogs表中注册一个唯一的事件日志,然后在每次遇到该事件时,将数百万个条目注册到Machine_Event_Logs中。

Goal: We're trying to join both table to display the entire timeline of event logs captured. 目标:我们试图将两个表都连接起来,以显示捕获的事件日志的整个时间表。

We've tried multiple combinations of the merge() function in Panda Dataframe but it only returns a bunch of NaN or empty. 我们在Panda Dataframe中尝试了merge()函数的多种组合,但它仅返回一堆NaN或为空。 For example: 例如:

pd.merge(event_logs, machine_event_logs, how='left', left_on='id', right_on='event_log_id')

Any ideas on how to solve this? 关于如何解决这个问题的任何想法?

Thank in in advance for your assistance. 在此先感谢您的协助。

According to your data schema, you have incompatible types where id in event_logs is an Integer and event_log_id in machine_event_logs is a String column. 根据您的数据模式,您具有不兼容的类型,其中event_logs中的id是一个整数,而machine_event_logs中的event_log_id是一个String列。 In Python the equality of a string and its equivalent numeric value yields false: 在Python中,字符串的相等性及其等效数值产生false:

print('0'==0)
# False

Therefore your pandas left join merge returns all NAN on right hand side since no matches are successfully found. 因此,由于未成功找到匹配项,因此您的熊猫左连接合并将返回右侧的所有NAN Consider converting to align types for proper merging: 考虑转换为对齐类型以进行正确合并:

event_logs['id'] = event_logs['id'].astype(str)

OR 要么

machine_event_logs['event_log_id'] = machine_event_logs['event_log_id'].astype(int)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM