简体   繁体   中英

Python 3, Pandas csv row that has key/values nested in some of the fields. How to flatten this into one wide row

I'm trying to manipulate some CSV data, and normally I would use pandas when I have complex changes. However I have no idea how to deal with nested key values inside one or more CSV fields.

So in essence I have data like this,

+------+------+-------------------+------+------+
| col1 | col2 | col3              | col4 | col5 |
+------+------+-------------------+------+------+
| v    | v    | ncol1=nv,ncol2=nv | v    | v    |
+------+------+-------------------+------+------+
| v    | v    | ncol3=nv          | v    | v    |
+------+------+-------------------+------+------+
| v    | v    |                   | v    | v    |
+------+------+-------------------+------+------+

And I'm trying to get something like,

+------+------+-------+-------+-------+------+------+
| col1 | col2 | ncol1 | ncol2 | ncol3 | col4 | col5 |
+------+------+-------+-------+-------+------+------+
| v    | v    | nv    | nv    |       | v    | v    |
+------+------+-------+-------+-------+------+------+
| v    | v    |       |       | nv    | v    | v    |
+------+------+-------+-------+-------+------+------+
| v    | v    |       |       |       | v    | v    |
+------+------+-------+-------+-------+------+------+

Assuming that the DataFrame Values in Column C is a comma separated string, the code does the following

  1. Creates a dictionary from the comma separated string
  2. Removes all null valued rows/ empty rows present in column C so that the dictionary object previously created can be expanded
  3. Dynamically creates new columns based on the dictionary keys
  4. Expands the Dictionary
  5. Merges the null valued Dataframe and newly created Dataframe
import pandas as pd
import numpy as np
df=pd.DataFrame({"A":['a','b','c',],"B":['e','f','d'],"C":['D=nv,E=nv',np.nan,"D=nv"],})
#Converts string to dictionary of key-value pairs
df.loc[:,"C"]=df.loc[:,"C"].apply(lambda x: dict(map(lambda z: z.split('='),x.split(","))) if type(x)==str else np.nan)
#Drop all null values present in Column so that the dataframe can be expanded
#Separate the null and actual rows containing values into 2 separate dataframes
df_act=df.dropna(subset=["C"])
df_null=df[~df.index.isin(df_act.index)]
#Expand the Column and store in a temporary DataFrame
df_temp=df_act['C'].apply(pd.Series)
for cols in df_temp.columns:
    df_act.loc[:,cols]=np.nan
    df_null.loc[:,cols]=np.nan

#Save Contents in the actual DataFrame
df_act[df_temp.columns]=df_temp
#Drop C Column to match with Sample Output
df_act.drop("C", axis=1, inplace=True)
df_null.drop("C", axis=1, inplace=True)
#Concatenate the DataFrames
final_df=pd.concat([df_act, df_null])

Please note that the removal of C column is only done so that output matches with the sample output provided.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM