Imagine i have a list containing multiple dictionaries
Sample dict
{'city': None, 'bot-origin': None, 'campaign-source': 'attendance bot', 'lastState': 'productAvailabilityCpfValidationTrue', 'main-installation-date': None, 'userid': '00377a70-fc79-424e-80c3-1f0324094378@tunnel.msging.net', 'full-name': None, 'alternative-installation-date': None, 'chosen-product': 'Internet', 'bank': None, 'postalcode': '82100690', 'due-date': None, 'cpf': '07670115971', 'origin-link': '', 'payment': None, 'state': None, 'api-orders-hash-id': None, 'email': None, 'api-orders-error': None, 'plan-name': None, 'userphone': '41 9893-6613', 'plan-offer': None, 'completed-address': None, 'type-of-person': 'CPF', 'onboarding-simplified': None, 'type-of-product': 'Residencial', 'main-installation-period-day': None, 'plan-value': None, 'alternative-installation-period-day': None, 'data-change': 'false'}
The list contains around 9000000, events such as the one displayed.
What i want to do is basically, break them apart into a kinda of dataframe format such as pd.DataFrame()
(i dont insist on it), but unfortunately. I tried commands such as pd.json_normalize()
, read_json
, from_records
and so on and they seen to be well consuming all my memory. My approach is to do some sort of chunksize, where i split the list/series into chunks, load them into variables put them into df format save them, and then clean out the memory, and after that concatenate everything. So you know my pc doesnt crash while trying to load everything at once.
Here is my attempt
def forma_extras(extras):
# Extras = serialized json, in series object format
for i in range(0,extras.size[0],100):
#Having a little trouble here
My solutions was something like this At least my computer doesnt crash when i run this Is this the most efficient i am not sure, maybe saving it and just taking out of memory would be better, will be doing following steps
def forma_extras(extras):
chunk = 100000
l_extra = []
for i in range(0,len(extras),chunk):
chunks = i + chunk
l_extra.append(pd.DataFrame.from_records(extras[i:chunks]))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.