简体   繁体   English

Pandas - 以指定格式从 Dataframe 中提取数据

[英]Pandas - Extract data from Dataframe in a specified format

I have a Dataframe that has data pulled from another system in the below format:我有一个数据框,其中包含从以下格式的另一个系统中提取的数据:

id,value
1001,--- !ruby/hash:Action::Params
         values:
         - 'ABC'
1002,--- !ruby/hash: Action:: Params
         values:
         - 'DEF'
         - '123'
         - 'Hello'

I am trying to have the data extracted from the above Dataframe and get the below format:我正在尝试从上述 Dataframe 中提取数据并获得以下格式:

id, value
1001,ABC
1002,DEF
1002,123
1002,Hello

Output of df.head().to_dict() df.head().to_dict() 的输出

{0: {0: 1001, 1: 1002, 2: 1003, 3: 1004, 4: 1005},
 1: {0: '--- !ruby/hash:Action::Params
     values:
     - 'ABC', 
     1: '!ruby/hash: Action:: Params
     values:
     - 'DEF'
     - '123'
     - 'Hello',
     2: '!ruby/hash: Action:: Params
     values:
     - '456'
     - '6666'
     - 'Bye'
     3: '!ruby/hash: Action:: Params
     values:
     - 'ffff'
     - 'tte',
     4: '!ruby/hash: Action:: Params
     values:
     - 'njytg'
}}

Here's a solution using Series.str.extractall with regular expression .这是使用Series.str.extractallregular expression的解决方案。

In this case we use positive lookbehind and positive lookahead :在这种情况下,我们使用positive lookbehindpositive lookahead

  • (?<=\\') : Characters preceded by a quotation mark ' (?<=\\') : 以引号'开头的字符
  • (?=\\') : Characters followed by a quotation mark ' (?=\\') : 后跟引号'字符
values = df['value'].str.extractall("(?<=\')(.*?)(?=\')").replace('-',np.NaN, regex=True).dropna()
df = values.droplevel(1).join(df['id']).reset_index(drop=True).rename(columns={0:'values'})

  values    id
0  ABC    1001
1  DEF    1002
2  123    1002
3  Hello  1002

Input example data used:使用的输入示例数据:

     id                                                          value
0  1001  !ruby/hash:Action::Params values: - 'ABC'                    
1  1002  !ruby/hash: Action:: Params values: - 'DEF' - '123' - 'Hello'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM