简体   繁体   中英

How to parse data from column in python pandas

I have a csv file contain the following:

原来的

and need to parse it and the expected results:

结果

Data in text:

id,product_id
    
1,[{'p_id': 59, 'p_name': 'IPF'}, {'p_id': 63, 'p_name': 'RBC'}, {'p_id': 47, 'p_name': 'CSP'}]

2,[{'p_id': 25, 'p_name': 'LPP'}, {'p_id': 86, 'p_name': 'CRS'}, {'p_id': 47, 'p_name': 'CSP'}]

3,[{'p_id': 73, 'p_name': 'OCC'}, {'p_id': 63, 'p_name': 'RBC'}]

4,[{'p_id': 63, 'p_name': 'RBC'}, {'p_id': 31, 'p_name': 'SUT'}, {'p_id': 73, 'p_name': 'OCC'}]

5,[{'p_id': 63, 'p_name': 'RBC'}]

As I have already mentioned in the comment, the data you have doesn't have string values enclosed inside quote, for eg : in [{'p_id': 59, 'p_name': IPF} , the value IPF is not enclosed by quote, so you can not use any direct method.

Out of several way, the easiest way is to use yaml ( pip install pyyaml ) package to parse those string values as Python object, then explode and apply pd.Series :

import pandas as pd
import yaml

filePath = 'file.csv'
df = pd.read_csv(filePath, index_col=0)
out = (df['product_id'].apply(lambda x: yaml.load(x, yaml.Loader))
        .explode()
        .apply(pd.Series)
       )

OUTPUT

>>> out
    p_id p_name
id             
1     59    IPF
1     63    RBC
1     47    CSP
2     25    LPP
2     86    CRS
2     47    CSP
3     73    OCC
3     63    RBC
4     63    RBC
4     31    SUT
4     73    OCC
5     63    RBC

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM