Consider having following DataFrame that I got from MySQL table of size of 11k rows:
col1 | col2 | col3 | col4
-----------------------------
cat | black | small | lovely
-----------------------------
dog | white | medium| brave
-----------------------------
mice | grey | tinny | fast
...
I want to convert it dynamically to the following:
col1 | newcol
------------------------------------------------------------
cat | {"col2": "black", "col3": "small", "col4": "lovely"}
------------------------------------------------------------
dog | {"col2": "white", "col3": "medium", "col4": "brave"}
------------------------------------------------------------
mice | {"col2": "grey", "col3": "tinny", "col4": "fast"}
...
You can do agg
as dict
on axis=1
For dictionary:
out = df[['col1']].assign(new_col=df.iloc[:,1:].agg(dict,1))
For json:
out = df[['col1']].assign(new_col=df.iloc[:,1:].agg(pd.Series.to_json,1))
print(out)
col1 new_col
0 cat {'col2': 'black', 'col3': 'small', 'col4': 'lo...
1 dog {'col2': 'white', 'col3': 'medium', 'col4': 'b...
2 mice {'col2': 'grey', 'col3': 'tinny', 'col4': 'fast'}
As you might expect, there are many ways to do this, but this is what came to mind:
>>> import pandas as pd
>>> d = {"col1": ["cat", 'dog', 'mice'], "col2": ["black", "white", "grey"], "col3": ["small", 'medium', 'tinny'], 'col4': ['lovely', 'brave','fast']}
>>> df = pd.DataFrame(d)
>>> pd.concat([df[['col1']], pd.DataFrame({"newcol": df[['col2','col3','col4']].to_dict(orient='records')})], axis=1)
For scenarios where you may not know the column names you want from the DataFrame, you can use the following to select the column indices. In this case, from column 1, until the end.
>>> pd.concat([df[['col1']], pd.DataFrame({"newcol": df.iloc[:, 1:].to_dict(orient='records')})], axis=1)
use df.to_json(orient='records')
to dump json record list, then loads json to list of dict, assign to new column.
import pandas as pd
df = pd.DataFrame({'col1': ['cat', 'dog', 'mice'],
'col2' : ['black', 'white', 'grey'],
'col3' : ['small', 'medium', 'tinny']})
# create json column
# data_json = df.iloc[:, 1:].to_json(orient='records')
# data = json.loads(data_json)
data = df.iloc[:, 1:].to_dict(orient='records')
# keep first column
dfn = df.iloc[:, [0]].copy()
dfn['newcol'] = data
# dfn['newcol'] = pd.Series(data).map(json.dumps)
dfn
col1 newcol
0 cat {"col2": "black", "col3": "small"}
1 dog {"col2": "white", "col3": "medium"}
2 mice {"col2": "grey", "col3": "tinny"}
data_json
(type str
)
[{"col2":"black","col3":"small"},{"col2":"white","col3":"medium"},{"col2":"grey","col3":"tinny"}]
In addition to Anky's answer I found this article that describes more manipulations: https://codeflex.co/mysql-table-migration-with-pandas-dataframe/
I am using three columns in the example below.
data = {'col1': ['cat', 'dog', 'mice'], 'col2' : ['black', 'white', 'grey'], 'col3' : ['small', 'medium', 'tinny']}
import pandas as pd
df = pd.DataFrame(data)
col = list(df.columns)
We can use a lambda function as follows
df.apply(lambda x: {col[1]:x[1], col[2]:x[2]}, axis =1)
You can add it to the dataframe as follows
df['new_col'] = df.apply(lambda x: {col[1]:x[1], col[2]:x[2]}, axis =1)
This produces the following output.
df
col1 col2 col3 new_col
0 cat black small {'col2': 'black', 'col3': 'small'}
1 dog white medium {'col2': 'white', 'col3': 'medium'}
2 mice grey tinny {'col2': 'grey', 'col3': 'tinny'}
And then drop the unwanted columns using df.drop
This should produce the required output.
df.drop(['col2', 'col3'], axis = 1)
col1 new_col
0 cat {'col2': 'black', 'col3': 'small'}
1 dog {'col2': 'white', 'col3': 'medium'}
2 mice {'col2': 'grey', 'col3': 'tinny'}
For the give requirement i would suggest using itertuples for generating a list of dicts and assigning it to the dataframe as follows
import pandas as pd
data = {'col1': ['cat', 'dog', 'mice'], 'col2' : ['black', 'white', 'grey'], 'col3' : ['small', 'medium', 'tinny'], 'col4': ['lovely','brave','fast']}
df = pd.DataFrame(data)
def getDictColumn_df1(df, new_col_name="newcol", cols_from_start=1):
df[new_col_name] = tuple(map(lambda row: row._asdict(), df.iloc[:,cols_from_start:].itertuples()))
return df[['col1', new_col_name]]
getDictColumn_df1(df)
to know more about itertuples() check this out
For a little different case when you want to keep all indexes and convert all values to a single dict, you can do
def getDictColumn_df2(df, new_col_name="newcol"):
df[new_col_name] = tuple(map(lambda row: row._asdict(), df.itertuples(index=False)))
return df[[new_col_name]]
getDictColumn_df2(df)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.