简体   繁体   中英

Pandas - Extract data from Dataframe in a specified format

I have a Dataframe that has data pulled from another system in the below format:

id,value
1001,--- !ruby/hash:Action::Params
         values:
         - 'ABC'
1002,--- !ruby/hash: Action:: Params
         values:
         - 'DEF'
         - '123'
         - 'Hello'

I am trying to have the data extracted from the above Dataframe and get the below format:

id, value
1001,ABC
1002,DEF
1002,123
1002,Hello

Output of df.head().to_dict()

{0: {0: 1001, 1: 1002, 2: 1003, 3: 1004, 4: 1005},
 1: {0: '--- !ruby/hash:Action::Params
     values:
     - 'ABC', 
     1: '!ruby/hash: Action:: Params
     values:
     - 'DEF'
     - '123'
     - 'Hello',
     2: '!ruby/hash: Action:: Params
     values:
     - '456'
     - '6666'
     - 'Bye'
     3: '!ruby/hash: Action:: Params
     values:
     - 'ffff'
     - 'tte',
     4: '!ruby/hash: Action:: Params
     values:
     - 'njytg'
}}

Here's a solution using Series.str.extractall with regular expression .

In this case we use positive lookbehind and positive lookahead :

  • (?<=\\') : Characters preceded by a quotation mark '
  • (?=\\') : Characters followed by a quotation mark '
values = df['value'].str.extractall("(?<=\')(.*?)(?=\')").replace('-',np.NaN, regex=True).dropna()
df = values.droplevel(1).join(df['id']).reset_index(drop=True).rename(columns={0:'values'})

  values    id
0  ABC    1001
1  DEF    1002
2  123    1002
3  Hello  1002

Input example data used:

     id                                                          value
0  1001  !ruby/hash:Action::Params values: - 'ABC'                    
1  1002  !ruby/hash: Action:: Params values: - 'DEF' - '123' - 'Hello'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM