How to parse data from column in python pandas

Question

I have a csv file contain the following:

原来的

and need to parse it and the expected results:

Data in text:

id,product_id
    
1,[{'p_id': 59, 'p_name': 'IPF'}, {'p_id': 63, 'p_name': 'RBC'}, {'p_id': 47, 'p_name': 'CSP'}]

2,[{'p_id': 25, 'p_name': 'LPP'}, {'p_id': 86, 'p_name': 'CRS'}, {'p_id': 47, 'p_name': 'CSP'}]

3,[{'p_id': 73, 'p_name': 'OCC'}, {'p_id': 63, 'p_name': 'RBC'}]

4,[{'p_id': 63, 'p_name': 'RBC'}, {'p_id': 31, 'p_name': 'SUT'}, {'p_id': 73, 'p_name': 'OCC'}]

5,[{'p_id': 63, 'p_name': 'RBC'}]

Answer 1

As I have already mentioned in the comment, the data you have doesn't have string values enclosed inside quote, for eg : in [{'p_id': 59, 'p_name': IPF} , the value IPF is not enclosed by quote, so you can not use any direct method.

Out of several way, the easiest way is to use yaml ( pip install pyyaml ) package to parse those string values as Python object, then explode and apply pd.Series :

import pandas as pd
import yaml

filePath = 'file.csv'
df = pd.read_csv(filePath, index_col=0)
out = (df['product_id'].apply(lambda x: yaml.load(x, yaml.Loader))
        .explode()
        .apply(pd.Series)
       )

OUTPUT

>>> out
    p_id p_name
id             
1     59    IPF
1     63    RBC
1     47    CSP
2     25    LPP
2     86    CRS
2     47    CSP
3     73    OCC
3     63    RBC
4     63    RBC
4     31    SUT
4     73    OCC
5     63    RBC

How to parse data from column in python pandas

Question

1 answers

solution1
1 2021-10-28 04:30:48

How to parse data from column in python pandas

Question

1 answers

solution1 1 2021-10-28 04:30:48

solution1
1 2021-10-28 04:30:48