I'm trying to manipulate some CSV data, and normally I would use pandas when I have complex changes. However I have no idea how to deal with nested key values inside one or more CSV fields.
So in essence I have data like this,
+------+------+-------------------+------+------+
| col1 | col2 | col3 | col4 | col5 |
+------+------+-------------------+------+------+
| v | v | ncol1=nv,ncol2=nv | v | v |
+------+------+-------------------+------+------+
| v | v | ncol3=nv | v | v |
+------+------+-------------------+------+------+
| v | v | | v | v |
+------+------+-------------------+------+------+
And I'm trying to get something like,
+------+------+-------+-------+-------+------+------+
| col1 | col2 | ncol1 | ncol2 | ncol3 | col4 | col5 |
+------+------+-------+-------+-------+------+------+
| v | v | nv | nv | | v | v |
+------+------+-------+-------+-------+------+------+
| v | v | | | nv | v | v |
+------+------+-------+-------+-------+------+------+
| v | v | | | | v | v |
+------+------+-------+-------+-------+------+------+
Assuming that the DataFrame Values in Column C
is a comma separated string, the code does the following
C
so that the dictionary object previously created can be expandedimport pandas as pd
import numpy as np
df=pd.DataFrame({"A":['a','b','c',],"B":['e','f','d'],"C":['D=nv,E=nv',np.nan,"D=nv"],})
#Converts string to dictionary of key-value pairs
df.loc[:,"C"]=df.loc[:,"C"].apply(lambda x: dict(map(lambda z: z.split('='),x.split(","))) if type(x)==str else np.nan)
#Drop all null values present in Column so that the dataframe can be expanded
#Separate the null and actual rows containing values into 2 separate dataframes
df_act=df.dropna(subset=["C"])
df_null=df[~df.index.isin(df_act.index)]
#Expand the Column and store in a temporary DataFrame
df_temp=df_act['C'].apply(pd.Series)
for cols in df_temp.columns:
df_act.loc[:,cols]=np.nan
df_null.loc[:,cols]=np.nan
#Save Contents in the actual DataFrame
df_act[df_temp.columns]=df_temp
#Drop C Column to match with Sample Output
df_act.drop("C", axis=1, inplace=True)
df_null.drop("C", axis=1, inplace=True)
#Concatenate the DataFrames
final_df=pd.concat([df_act, df_null])
Please note that the removal of C
column is only done so that output matches with the sample output provided.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.