简体   繁体   中英

What are the possible ways for JSON data processing using SQL, elastic search or preprocessing using python

I have a case study where i need to take data from a REST API do some analysis on the data using aggregate function,joins etc and use the response data in JSON format to plot some retail grahs.

Approaches being followed till now:

  1. Read the data from JSON store these in python variable and use insert to hit the SQL query. Obviously it is a costly operation because for every JSON line read it is inserting into database.For 33k rows it is taking more than 20 mins which is inefficient.

  2. This can be handled in elastic search for faster processing but complex operation like joins are not present in elastic search.

If anybody can suggest what would be the best approach (like preprocessing or post processing in python) to follow for handling such scenerios it would be helpful.

Thanks in advance

Sql Sript

def store_data(AccountNo)

        db=MySQLdb.connect(host=HOST, user=USER, passwd=PASSWD, db=DATABASE, charset="utf8")
        cursor = db.cursor()
        insert_query = "INSERT INTO cstore (AccountNo) VALUES (%s)"
        cursor.execute(insert_query, (AccountNo))
        db.commit()
        cursor.close()
        db.close()
        return

def on_data(file_path):
        #This is the meat of the script...it connects to your mongoDB and stores the tweet
        try:
           # Decode the JSON from Twitter
            testFile = open(file_path)

            datajson = json.load(testFile)
            #print (len(datajson))

            #grab the wanted data from the Tweet
            for i in range(len(datajson)):
                for cosponsor in datajson[i]:
                    AccountNo=cosponsor['AccountNo']
                    store_data( AccountNo)

Edit1: Json Added

{
    "StartDate": "1/1/18",
    "EndDate": "3/30/18",
    "Transactions": [
        {
            "CSPAccountNo": "41469300",
            "ZIP": "60098",
            "ReportDate": "2018-03-08T00:00:00",
            "POSCode": "00980030003",
            "POSCodeModifier": "0",
            "Description": "TIC TAC GUM WATERMEL",
            "ActualSalesPrice": 1.59,
            "TotalCount": 1,
            "Totalsales": 1.59,
            "DiscountAmount": 0,
            "DiscountCount": 0,
            "PromotionAmount": 0,
            "PromotionCount": 0,
            "RefundAmount": 0,
            "RefundCount": 0
        },
        {
            "CSPAccountNo": "41469378",
            "ZIP": "60098",
            "ReportDate": "2018-03-08T00:00:00",
            "POSCode": "01070080727",
            "POSCodeModifier": "0",
            "Description": "PAYDAY KS",
            "ActualSalesPrice": 2.09,
            "TotalCount": 1,
            "Totalsales": 2.09,
            "DiscountAmount": 0,
            "DiscountCount": 0,
            "PromotionAmount": 0,
            "PromotionCount": 0,
            "RefundAmount": 0,
            "RefundCount": 0

}
]
}

I do not have your json file so not know if it is runnable, but I would have tried something like below: I read just your account infos to a list and than try to write to the db at once with executemany I expect it to have a better(less) execution time than 20 mins.

def store_data(AccountNo):
    db = MySQLdb.connect(host=HOST, user=USER, passwd=PASSWD, db=DATABASE, charset="utf8")
    cursor = db.cursor()
    insert_query = "INSERT INTO cstore (AccountNo,ZIP,ReportDate) VALUES (:AccountNo,:ZIP,:ReportDate)"
    cursor.executemany(insert_query, AccountNo)
    db.commit()
    cursor.close()
    db.close()
    return

def on_data(file_path):
    # This is the meat of the script...it connects to your mongoDB and stores the tweet
    try:
        #declare an empty list for the all accountno's
        accountno_list = list()

        # Decode the JSON from Twitter
        testFile = open(file_path)

        datajson = json.load(testFile)
        # print (len(datajson))

        # grab the wanted data from the Tweet
        for row in datajson[0]['Transactions']:
            values = dict()
            values['AccountNo'] = row['CSPAccountNo']
            values['ZIP'] = row['ZIP']
            values['ReportDate'] = row['ReportDate']
           #from here on you can populate the attributes you need in a similar way..
        accountno_list.append(values)
    except:
        pass                    
    store_data(accountno_list)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM