简体   繁体   English

如何通过Python中的API查询在SQL Alchemy中插入关系数据(很多)

[英]How to insert relational data (many to many) in SQL Alchemy by means of API queries in Python

EDIT: I have made a short question because I think this one is too long, sorry 编辑:我提出了一个简短的问题,因为我认为这太长了,对不起

First of all, I am a newcomer to databases, programming languages and so on... so sorry if this question is not so proper nor specific, any help or guidance would be much appreciated... 首先,我是数据库,编程语言等的新手。很抱歉,如果这个问题不是那么恰当或具体,那么任何帮助或指导将不胜感激。

The context I am working with is the following: I am querying an existing database by means of its APIs in order to retrieve certain information to design my own database. 我正在使用的上下文如下:我正在通过其API查询现有数据库,以便检索某些信息来设计自己的数据库。

The point to create this database is for example to let the user introduce a gene to know where in the organism it is over (UP) or under (DOWN) expressed, and in which experiment this type of expression has been seen. 例如,创建此数据库的目的是让用户引入一个基因,以了解其在生物体中的表达水平(UP)以上(DOWN)还是表达水平较低(DOWN),以及在哪种实验中看到了这种表达类型。

For the time being, what I'm doing is just querying the existing database and parsing the json result to obtain for each organism part, all the genes that are over or under-expressed (and for each gene I obtain as well the experiments where that type of expression has been reported) 就目前而言,我正在做的只是查询现有数据库并解析json结果,以获取每个生物体部分,所有表达过高或表达不足的基因(对于每个基因,我也获得了其中的实验该类型的表达已被报告)

(In Brain) (在大脑中)

GENE1 基因1

Experiment1     UP
Experiment2     UP
Experiment3     UP
Experiment4     DOWN

GENE2 基因2

Experiment5     DOWN
Experiment2     DOWN
Experiment3     DOWN
Experiment8     UP
Experiment9     DOWN

Different tables I think I will need are: "genes", "organs", "experiments" and "type of expression" (and "genes2experiments2organs") 我认为我需要的不同表格是:“基因”,“器官”,“实验”和“表达类型”(以及“ genes2experiments2organs”)

Take into account that one gene can be expressed in more than one organism_part and can have different types of expression related to more than one experiment, and one experiment can comprise more than one gene (many to many relationships) 考虑到一个基因可以在不止一个生物体中表达,并且可以具有与一个以上实验相关的不同类型的表达,并且一个实验可以包含一个以上基因(许多关系)

What I would like know first is how to add relational data and know if my attempt is going to the right direction or should I change the schema/idea of the database... 我首先想知道的是如何添加关系数据,并知道我的尝试是朝正确的方向还是应该更改数据库的架构/想法...

My first attempt is this: 我的第一次尝试是:

###########################################
DATABASE DEFINITION
###########################################

from sqlalchemy import create_engine, Column, Integer, String, Date, ForeignKey, Table, Float
from sqlalchemy.orm import sessionmaker, relationship, backref
from sqlalchemy.ext.declarative import declarative_base
import requests

Base = declarative_base()

Genes2experiments2organs = Table('genes2experiments2organs',Base.metadata,                                                                                                                                                                                                     
  Column('gene_id', String, ForeignKey('genes.id')),                                                                                                                                                                                                       
  Column('experiment_id', String, ForeignKey('experiments.id')),
  Column('organ_id', String, ForeignKey('organs.id'))
)

class Genes(Base):
    __tablename__ = 'genes'
    id = Column(String(45), primary_key=True)
    def __init__(self, id=""):
        self.id= id
    def __repr__(self):
        return "<genes(id:'%s')>" % (self.id)

class Experiments(Base):
    __tablename__ = 'experiments'
    id = Column(String(45), primary_key=True)
    experiments = relationship("Experiments", secondary=Genes2experiments2organs, backref="genes")
    organs = relationship("Organs", secondary=Genes2experiments2organs, backref="genes")
    def __init__(self, id=""):
        self.id= id
    def __repr__(self):
        return "<experiments(id:'%s')>" % (self.id)

class Organs(Base):
    __tablename__ = 'organs'
    id = Column(String(45), primary_key=True)
    def __init__(self, id=""):
        self.id= id
    def __repr__(self):
        return "<organs(id:'%s')>" % (self.id)

class Expression_type(Base):
    __tablename__ = 'expression_type'
    id = Column(String(45), primary_key=True)
    def __init__(self, id=""):
        self.id= id
    def __repr__(self):
        return "<expression_type(id:'%s')>" % (self.id)

#####################################################
INSERTING DATA
#####################################################

def setUp():
    global Session
    engine=create_engine('mysql://root:password@localhost/db_name?charset=utf8', pool_recycle=3600,echo=False)
    Session=sessionmaker(bind=engine)

def add_data():   ## I am just adding genes without taking into account the other related data to these genes.....
    session=Session()
    for i in range(0,1000,200):
        request= requests.get('http://www.ebi.ac.uk/gxa/api/v1',params={"updownInOrganism_part":"brain","rows":200,"start":i})
        result = request.json
        for item in result['results']:
            gene_to_add = item['gene']['ensemblGeneId']           
    session.commit()
    session.close()       


setUp()
add_data()
session=Session()
genes=session.query(Genes).all()
print "List of genes introduced:"
for gene in genes:
    print gene.id
session.close()

So, with this code I just populate "genes" table, but without taking into account the relationships that exist with the other data I will have to include in the database... what is the procedure to do such thing, adding relational data? 因此,使用此代码,我只填充了“ genes”表,却没有考虑与其他数据之间存在的关系,因此我必须将这些数据包括在数据库中……添加相关数据的过程是什么? And a way to avoid inserting duplicate genes for instance when populating the table by means of API queries?? 还有一种避免例如在通过API查询填充表时插入重复基因的方法?

By the way, as you can see, I didn't put all many to many relationships (secondary), just in "genes" table because I am no sure if I am going right or completely wrong... thank you 顺便说一句,正如您所看到的,我并没有在“ genes”表中放入所有许多关系(次要关系),因为我不确定我是对还是完全错了……谢谢

This should do what you are looking for ... 这应该做您想要的...

from sqlalchemy import (Column, create_engine, Integer, ForeignKey, Unicode,
                        Enum)
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, relationship

Base = declarative_base()

class Gene(Base):
    __tablename__ = 'gene'

    id = Column(Integer, primary_key=True)
    name = Column(Unicode(64), unique=True)

    def __init__(self, name):
        self.name = name

class Experiment(Base):
    __tablename__ = 'experiment'

    id = Column(Integer, primary_key=True)

class Organ(Base):
    __tablename__ = 'organ'

    id = Column(Integer, primary_key=True)
    name = Column(Unicode(64), unique=True)

    def __init__(self, name):
        self.name = name

class Measurement(Base):
    __tablename__ = 'measurement'

    id = Column(Integer, primary_key=True)
    experiment_id = Column(Integer, ForeignKey(Experiment.id))
    gene_id = Column(Integer, ForeignKey(Gene.id))
    organ_id = Column(Integer, ForeignKey(Organ.id))

    # Add your measured values here
    expression = Column(Enum('UP', 'DOWN'))
    # ...

    experiment = relationship(Experiment, backref='measurements')
    gene = relationship(Gene, backref='measurements')
    organ = relationship(Organ, backref='measurements')

    def __repr__(self):
        return 'Experiment %d: %s, %s, %s' % (self.experiment.id,
                         self.gene.name, self.organ.name, self.expression)

if __name__ == '__main__':
    engine = create_engine('sqlite://')
    session = sessionmaker(engine)()
    Base.metadata.create_all(engine)

    #
    # Creating the data
    #

    x = Gene('Gene X')
    y = Gene('Gene Y')
    z = Gene('Gene Z')

    heart = Organ('Heart')
    lungs = Organ('Lungs')
    brain = Organ('Brain')

    session.add_all([x, y, z, heart, lungs, brain])
    session.commit()

    experiment_1 = Experiment()
    experiment_1.measurements.extend(
            [Measurement(gene_id=x.id, organ_id=heart.id, expression='UP'),
             Measurement(gene_id=x.id, organ_id=lungs.id, expression='UP'),
             Measurement(gene_id=x.id, organ_id=brain.id, expression='DOWN'),
             Measurement(gene_id=y.id, organ_id=brain.id, expression='UP'),
             Measurement(gene_id=z.id, organ_id=brain.id, expression='DOWN')])

    experiment_2 = Experiment()
    experiment_2.measurements.extend(
            [Measurement(gene_id=y.id, organ_id=lungs.id, expression='UP'),
             Measurement(gene_id=y.id, organ_id=lungs.id, expression='UP'),
             Measurement(gene_id=y.id, organ_id=brain.id, expression='UP'),
             Measurement(gene_id=x.id, organ_id=brain.id, expression='UP'),
             Measurement(gene_id=z.id, organ_id=heart.id, expression='UP')])

    session.add_all([experiment_1, experiment_2])
    session.commit()

    #
    # Querying the data
    #

    print('All measurements in the first experiment')
    experiment = session.query(Experiment).filter(Experiment.id == 1).one()
    for measurement in experiment.measurements:
        print(measurement)
    print('')

    print('All measurements of Gene X')
    gene_x = session.query(Gene).filter(Gene.name == 'Gene X').one()
    for measurement in gene_x.measurements:
        print(measurement)
    print('')

    print('All measurements of the brain')
    the_brain = session.query(Organ).filter(Organ.name == 'Brain').one()
    for measurement in the_brain.measurements:
        print(measurement)
    print('')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM