I have a Dataframe that has data pulled from another system in the below format:
id,value
1001,--- !ruby/hash:Action::Params
values:
- 'ABC'
1002,--- !ruby/hash: Action:: Params
values:
- 'DEF'
- '123'
- 'Hello'
I am trying to have the data extracted from the above Dataframe and get the below format:
id, value
1001,ABC
1002,DEF
1002,123
1002,Hello
Output of df.head().to_dict()
{0: {0: 1001, 1: 1002, 2: 1003, 3: 1004, 4: 1005},
1: {0: '--- !ruby/hash:Action::Params
values:
- 'ABC',
1: '!ruby/hash: Action:: Params
values:
- 'DEF'
- '123'
- 'Hello',
2: '!ruby/hash: Action:: Params
values:
- '456'
- '6666'
- 'Bye'
3: '!ruby/hash: Action:: Params
values:
- 'ffff'
- 'tte',
4: '!ruby/hash: Action:: Params
values:
- 'njytg'
}}
Here's a solution using Series.str.extractall
with regular expression
.
In this case we use positive lookbehind
and positive lookahead
:
(?<=\\')
: Characters preceded by a quotation mark '
(?=\\')
: Characters followed by a quotation mark '
values = df['value'].str.extractall("(?<=\')(.*?)(?=\')").replace('-',np.NaN, regex=True).dropna()
df = values.droplevel(1).join(df['id']).reset_index(drop=True).rename(columns={0:'values'})
values id
0 ABC 1001
1 DEF 1002
2 123 1002
3 Hello 1002
Input example data used:
id value
0 1001 !ruby/hash:Action::Params values: - 'ABC'
1 1002 !ruby/hash: Action:: Params values: - 'DEF' - '123' - 'Hello'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.