Pivoting a pandas dataframe with json data

Question

I have a pandas dataframe with some rows containing json data

Date	message
2020-01-01	{ 'var1': 'foo'}
2020-01-01	different content
2020-01-01	{ 'var2': 'foo'}
2020-01-02	non json content
2020-01-02	{ 'var2': 'bar'}

I would like to pivot the table so that I have a time series of the different variables, like

Date	var1	var2
2020-01-01	foo	foo
2020-01-02	nan/empty	bar

Answer 1

Convert values to Series with convert jsons to dicts, last get first non missing value per datetimes by GroupBy.first :

import ast


def f(x):
    try:
        return pd.Series(ast.literal_eval(x))
    except SyntaxError:
        return pd.Series()

df = df.set_index('Date')['message'].apply(f).groupby(level=0).first()
print (df)
            var1 var2
Date                 
2020-01-01   foo  foo
2020-01-02  None  bar

Answer 2

Not sure what your data exactly looks like, but you could do the following

Use pd.json_normalize() to parse some of the nested json content
Then use groupby() to get for example the first result

Here is an example:

import pandas as pd

#Input data
data = {
        "Date": ["2020-01-01","2020-01-01","2020-01-01","2020-01-02","2020-01-02"],
        "message": [{'var1': 'foo'}, {'stupidkey':'stupidval'}, {'var2':'foo'}, "bla", {'var2':'bar'}]
        }
df = pd.DataFrame(data)

# Generate result
varsToKeep = ["var1", "var2"]
df_vars = pd.json_normalize(df["message"])[varsToKeep]
df = df.join(df_vars)
df_result = df.groupby("Date").first()[varsToKeep]

Pivoting a pandas dataframe with json data

Question

2 answers

solution1
0 2023-01-02 12:36:55

solution2
0 2023-01-02 13:30:31

Pivoting a pandas dataframe with json data

Question

2 answers

solution1 0 2023-01-02 12:36:55

solution2 0 2023-01-02 13:30:31

solution1
0 2023-01-02 12:36:55

solution2
0 2023-01-02 13:30:31