Python 3, Pandas csv row that has key/values nested in some of the fields. How to flatten this into one wide row

Question

I'm trying to manipulate some CSV data, and normally I would use pandas when I have complex changes. However I have no idea how to deal with nested key values inside one or more CSV fields.

So in essence I have data like this,

+------+------+-------------------+------+------+
| col1 | col2 | col3              | col4 | col5 |
+------+------+-------------------+------+------+
| v    | v    | ncol1=nv,ncol2=nv | v    | v    |
+------+------+-------------------+------+------+
| v    | v    | ncol3=nv          | v    | v    |
+------+------+-------------------+------+------+
| v    | v    |                   | v    | v    |
+------+------+-------------------+------+------+

And I'm trying to get something like,

+------+------+-------+-------+-------+------+------+
| col1 | col2 | ncol1 | ncol2 | ncol3 | col4 | col5 |
+------+------+-------+-------+-------+------+------+
| v    | v    | nv    | nv    |       | v    | v    |
+------+------+-------+-------+-------+------+------+
| v    | v    |       |       | nv    | v    | v    |
+------+------+-------+-------+-------+------+------+
| v    | v    |       |       |       | v    | v    |
+------+------+-------+-------+-------+------+------+

Answer 1

Assuming that the DataFrame Values in Column C is a comma separated string, the code does the following

Creates a dictionary from the comma separated string
Removes all null valued rows/ empty rows present in column C so that the dictionary object previously created can be expanded
Dynamically creates new columns based on the dictionary keys
Expands the Dictionary
Merges the null valued Dataframe and newly created Dataframe

import pandas as pd
import numpy as np
df=pd.DataFrame({"A":['a','b','c',],"B":['e','f','d'],"C":['D=nv,E=nv',np.nan,"D=nv"],})
#Converts string to dictionary of key-value pairs
df.loc[:,"C"]=df.loc[:,"C"].apply(lambda x: dict(map(lambda z: z.split('='),x.split(","))) if type(x)==str else np.nan)
#Drop all null values present in Column so that the dataframe can be expanded
#Separate the null and actual rows containing values into 2 separate dataframes
df_act=df.dropna(subset=["C"])
df_null=df[~df.index.isin(df_act.index)]
#Expand the Column and store in a temporary DataFrame
df_temp=df_act['C'].apply(pd.Series)
for cols in df_temp.columns:
    df_act.loc[:,cols]=np.nan
    df_null.loc[:,cols]=np.nan

#Save Contents in the actual DataFrame
df_act[df_temp.columns]=df_temp
#Drop C Column to match with Sample Output
df_act.drop("C", axis=1, inplace=True)
df_null.drop("C", axis=1, inplace=True)
#Concatenate the DataFrames
final_df=pd.concat([df_act, df_null])

Please note that the removal of C column is only done so that output matches with the sample output provided.

Python 3, Pandas csv row that has key/values nested in some of the fields. How to flatten this into one wide row

Question

1 answers

solution1
0 2020-08-06 19:30:44

Python 3, Pandas csv row that has key/values nested in some of the fields. How to flatten this into one wide row

Question

1 answers

solution1 0 2020-08-06 19:30:44

solution1
0 2020-08-06 19:30:44