简体   繁体   English

使用SQL进行JSON数据处理,使用python进行弹性搜索或预处理的可能方式是什么

[英]What are the possible ways for JSON data processing using SQL, elastic search or preprocessing using python

I have a case study where i need to take data from a REST API do some analysis on the data using aggregate function,joins etc and use the response data in JSON format to plot some retail grahs. 我有一个案例研究,其中我需要从REST API中获取数据,然后使用聚合函数,联接等对数据进行一些分析,并使用JSON格式的响应数据来绘制一些零售图。

Approaches being followed till now: 到目前为止所遵循的方法:

  1. Read the data from JSON store these in python variable and use insert to hit the SQL query. 从JSON读取数据,并将其存储在python变量中,然后使用insert命中SQL查询。 Obviously it is a costly operation because for every JSON line read it is inserting into database.For 33k rows it is taking more than 20 mins which is inefficient. 显然,这是一项昂贵的操作,因为每读取一条JSON行都会将其插入数据库中。对于33,000行而言,这需要20分钟以上的时间,效率低下。

  2. This can be handled in elastic search for faster processing but complex operation like joins are not present in elastic search. 可以在弹性搜索中进行处理,以加快处理速度,但是弹性搜索中不存在诸如联接之类的复杂操作。

If anybody can suggest what would be the best approach (like preprocessing or post processing in python) to follow for handling such scenerios it would be helpful. 如果有人可以建议遵循最佳方法(例如python中的预处理或后处理)来处理此类场景,那将是有帮助的。

Thanks in advance 提前致谢

Sql Sript SQL脚本

def store_data(AccountNo)

        db=MySQLdb.connect(host=HOST, user=USER, passwd=PASSWD, db=DATABASE, charset="utf8")
        cursor = db.cursor()
        insert_query = "INSERT INTO cstore (AccountNo) VALUES (%s)"
        cursor.execute(insert_query, (AccountNo))
        db.commit()
        cursor.close()
        db.close()
        return

def on_data(file_path):
        #This is the meat of the script...it connects to your mongoDB and stores the tweet
        try:
           # Decode the JSON from Twitter
            testFile = open(file_path)

            datajson = json.load(testFile)
            #print (len(datajson))

            #grab the wanted data from the Tweet
            for i in range(len(datajson)):
                for cosponsor in datajson[i]:
                    AccountNo=cosponsor['AccountNo']
                    store_data( AccountNo)

Edit1: Json Added 编辑1:杰森添加

{
    "StartDate": "1/1/18",
    "EndDate": "3/30/18",
    "Transactions": [
        {
            "CSPAccountNo": "41469300",
            "ZIP": "60098",
            "ReportDate": "2018-03-08T00:00:00",
            "POSCode": "00980030003",
            "POSCodeModifier": "0",
            "Description": "TIC TAC GUM WATERMEL",
            "ActualSalesPrice": 1.59,
            "TotalCount": 1,
            "Totalsales": 1.59,
            "DiscountAmount": 0,
            "DiscountCount": 0,
            "PromotionAmount": 0,
            "PromotionCount": 0,
            "RefundAmount": 0,
            "RefundCount": 0
        },
        {
            "CSPAccountNo": "41469378",
            "ZIP": "60098",
            "ReportDate": "2018-03-08T00:00:00",
            "POSCode": "01070080727",
            "POSCodeModifier": "0",
            "Description": "PAYDAY KS",
            "ActualSalesPrice": 2.09,
            "TotalCount": 1,
            "Totalsales": 2.09,
            "DiscountAmount": 0,
            "DiscountCount": 0,
            "PromotionAmount": 0,
            "PromotionCount": 0,
            "RefundAmount": 0,
            "RefundCount": 0

}
]
}

I do not have your json file so not know if it is runnable, but I would have tried something like below: I read just your account infos to a list and than try to write to the db at once with executemany I expect it to have a better(less) execution time than 20 mins. 我没有您的json文件,所以不知道它是否可运行,但是我会尝试如下操作:我只将您的帐户信息读取到列表中,然后尝试通过executemany一次写入数据库,我希望它具有比20分钟更好(更少)的执行时间。

def store_data(AccountNo):
    db = MySQLdb.connect(host=HOST, user=USER, passwd=PASSWD, db=DATABASE, charset="utf8")
    cursor = db.cursor()
    insert_query = "INSERT INTO cstore (AccountNo,ZIP,ReportDate) VALUES (:AccountNo,:ZIP,:ReportDate)"
    cursor.executemany(insert_query, AccountNo)
    db.commit()
    cursor.close()
    db.close()
    return

def on_data(file_path):
    # This is the meat of the script...it connects to your mongoDB and stores the tweet
    try:
        #declare an empty list for the all accountno's
        accountno_list = list()

        # Decode the JSON from Twitter
        testFile = open(file_path)

        datajson = json.load(testFile)
        # print (len(datajson))

        # grab the wanted data from the Tweet
        for row in datajson[0]['Transactions']:
            values = dict()
            values['AccountNo'] = row['CSPAccountNo']
            values['ZIP'] = row['ZIP']
            values['ReportDate'] = row['ReportDate']
           #from here on you can populate the attributes you need in a similar way..
        accountno_list.append(values)
    except:
        pass                    
    store_data(accountno_list)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM