从 pandas dataframe strings 列中提取数据，并根据其中的内容生成新的列

Question

I have a pandas column which has data like this:我有一个 pandas 列，其中包含如下数据：

**Title **: New_ind **标题**：New_ind

**Body **: Detection_error **正文**：检测错误

*respo_URL **: www.github.com *respo_URL **: www.github.com

**respo_status **: {color} **respo_status **：{颜色}

data = {'sl no': [661, 662],
        'key': ['3484', '3483'],
        'id': [13592349, 13592490],
        'Sum': ['[E-1]', '[E-1]'],
        'Desc': [
              "**Title **: New_ind\n\n**Body **: Detection_error\n\n*respo_URL **: www.github.com\n\n**respo_status **: {yellow}","**Title **: New_ind2\n\n**Body **: import_error\n\n*respo_URL **: \n\n**respo_status **: {green}"]}

df = pd.DataFrame(data)

I need to generate new columns where Title, Body, response_URL, etc would be column names and everything after: should be the value contained in those column cells.我需要生成新列，其中 Title、Body、response_URL 等将是列名，后面的所有内容：应该是这些列单元格中包含的值。 Just to mention the items in the column are not dictionaries只是提一下列中的项目不是字典

Answer 1

There are various ways to do that with regex but I found this with str -methods to be the clearest:使用正则表达式有多种方法可以做到这一点，但我发现使用str方法最清楚：

desc_df = df["Desc"].str.split("\n\n", expand=True)
for col in desc_df.columns:
    desc_df[col] = desc_df[col].str.split(":").str[1].str.strip()
colnames = "Title", "Body", "respo_URL", "respo_status"
desc_df = desc_df.rename(columns=dict(enumerate(colnames)))
df = pd.concat([df.drop(columns="Desc"), desc_df], axis=1)

First split column Desc at \n\n and expand the result into a dataframe desc_df .首先在\n\n拆分列Desc并将结果展开为 dataframe desc_df 。
Then split each new column at : , take the right side, and strip whitespace.然后在:拆分每个新列，取右侧，并去除空格。
Finally change the column names and concat the initial dataframe without the Desc column and desc_df .最后更改列名并连接初始的 dataframe，不带Desc列和desc_df 。

Result for the sample:示例结果：

   sl no   key        id    Sum     Title             Body       respo_URL  \
0    661  3484  13592349  [E-1]   New_ind  Detection_error  www.github.com   
1    662  3483  13592490  [E-1]  New_ind2     import_error                   

  respo_status  
0     {yellow}  
1      {green}

The following regex-version worked for the sample, but I think it's not as robust the other one:以下正则表达式版本适用于该示例，但我认为它不如另一个强大：

pattern = "\n\n".join(
    f"\*+{col} \*+: (?P<{col}>[^\n]*)"
    for col in ("Title", "Body", "respo_URL", "respo_status")    
)
desc_df = df["Desc"].str.extract(pattern)
df = pd.concat([df.drop(columns="Desc"), desc_df], axis=1)

从 pandas dataframe strings 列中提取数据，并根据其中的内容生成新的列

问题描述

1 个解决方案

解决方案1
1 已采纳 2023-01-31 09:52:06

从 pandas dataframe strings 列中提取数据，并根据其中的内容生成新的列

问题描述

1 个解决方案

解决方案1 1 已采纳 2023-01-31 09:52:06

解决方案1
1 已采纳 2023-01-31 09:52:06