Split data of a pandas DataFrame in Python

Question

I have this DataFrame:

import pandas as pd
df = [{"username": "last",
"time_data": "{\"hours\":[{\"hour\":\"00:00\",\"postCount\":\"5\",\"topicCount\":\"3\",\"totalCount\":80},{\"hour\":\"01:00\",\"postCount\":\"11\",\"topicCount\":\"6\",\"topciCount\":31}"
},
{"username": "truk",
 "time_data": "{\"hours\":[{\"hour\":\"00:00\",\"postCount\":\"11\",\"topicCount\":\"6\",\"totalCount\":362},{\"hour\":\"01:00\",\"postCount\":\"22\",\"topicCount\":\"8\",\"topicCount\":355}"
}]
df = pd.DataFrame(df)
df

I have used this code to get the "postCount" of both '00:00' and '01:00':

df_h0 = df.copy()
df_h0['hour']='00:00'
df_h0['totalCount']=df.time_data.str.split('"00:00","postCount":"').str[1].str.split('","topic').str[0]
df_h0 = df_h0.fillna(0)

df_h1 = df.copy()
df_h1['hour']='01:00'
df_h1['totalCount']=df.time_data.str.split('"01:00","postCount":"').str[1].str.split('","topic').str[0]
df_h1 = df_h1.fillna(0)

df_tot = df_h0.append([df_h1])
df_tot.head()

But now I want to get the "totalCount" which is not just next to the hours. Anyone knows how to do that?

Expected output:

                           time_data                 username   hour    totalCount
0   {"hours":[{"hour":"00:00","postCount":"5","top...   last    00:00   80
1   {"hours":[{"hour":"00:00","postCount":"11","to...   truk    00:00   362
0   {"hours":[{"hour":"00:00","postCount":"5","top...   last    01:00   31
1   {"hours":[{"hour":"00:00","postCount":"11","to...   truk    01:00   355

Answer 1

It looks like you're trying to manually parse a JSON file, have you looked into using Python's json library instead?

Pandas also has functionality to create dataframes from JSON files, here.

Both of these may make it easier for you to parse your data.

Split data of a pandas DataFrame in Python

Question

1 answers

solution1
0 2019-07-30 19:28:40

Split data of a pandas DataFrame in Python

Question

1 answers

solution1 0 2019-07-30 19:28:40

solution1
0 2019-07-30 19:28:40