I have a DataFrame that looks like this:
df = pd.DataFrame({"id": ["200"], "0": ["miner"], "1": ["miner, manager"], "2": ["mining, dude number 7"], "3": ["marshall"]})
I'd like to turn this into a list of dictionaries with keys as the "id" repeated on "value" that are the values of each column split by ","
if it exists that would look like an output:
list_dict_from_df = [{"id": "200", "value": [{"lower": "miner"}]}, {"id": "200", "value": [{"lower": "miner"}, {"lower": "manager"}]}, {"id": "200", "value": [{"lower": "mining"}, {"lower": "dude number 7"}]}, {"id": "200", "value": [{"lower": "marshall"}]}]
I'm currently using a brute force method inside a loop to do this:
d_range = range(1, len(df.columns)
d_out = []
for i in d_range:
d_out.append({"id": code, "value": [{"lower": col} for col in df.iloc[:, i].str.split(',')]})
This gets me close:
d_out:
[{"id": 200, "value": [{"lower": ["miner"]}]}, {"id": 2000, "value": [{"lower": ["miner", "manager"]}]}]
However, I don't want the strings "miner" to be contained within lists but separated by "," and their elements each taken into an id values like shown above.
I prefer to find a non anti-patterns to DataFrame solution (not iterrows() type) if possible...
Try:
from pprint import pprint
lst = []
for id_, g in df.groupby("id"):
for _, row in g.iterrows():
for cell in row["0":]:
lst.append(
{
"id": id_,
"value": [
{"lower": v} for v in map(str.strip, cell.split(","))
],
}
)
pprint(lst)
Prints:
[{'id': '200', 'value': [{'lower': 'miner'}]},
{'id': '200', 'value': [{'lower': 'miner'}, {'lower': 'manager'}]},
{'id': '200', 'value': [{'lower': 'mining'}, {'lower': 'dude number 7'}]},
{'id': '200', 'value': [{'lower': 'marshall'}]}]
Here is one way after reshaping your data such that id is set_index
, then all the columns becomes rows with stack
and using explode
to get a row once str.split
by the comma. Loop over a groupby
indexes to get the expected output
d = [{'id':i, 'value':vals.to_dict(orient='records')}
for (i, _), vals in df.set_index('id').stack()
.str.split(',').explode()
.to_frame(name='lower')
.groupby(level=[0,1])
]
d
[{'id': '200', 'value': [{'lower': 'miner'}]},
{'id': '200', 'value': [{'lower': 'miner'}, {'lower': ' manager'}]},
{'id': '200', 'value': [{'lower': 'mining'}, {'lower': ' dude number 7'}]},
{'id': '200', 'value': [{'lower': 'marshall'}]}]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.