I have a pandas column which has data like this:
**Title **: New_ind
**Body **: Detection_error
*respo_URL **: www.github.com
**respo_status **: {color}
data = {'sl no': [661, 662],
'key': ['3484', '3483'],
'id': [13592349, 13592490],
'Sum': ['[E-1]', '[E-1]'],
'Desc': [
"**Title **: New_ind\n\n**Body **: Detection_error\n\n*respo_URL **: www.github.com\n\n**respo_status **: {yellow}","**Title **: New_ind2\n\n**Body **: import_error\n\n*respo_URL **: \n\n**respo_status **: {green}"]}
df = pd.DataFrame(data)
I need to generate new columns where Title, Body, response_URL, etc would be column names and everything after: should be the value contained in those column cells. Just to mention the items in the column are not dictionaries
There are various ways to do that with regex but I found this with str
-methods to be the clearest:
desc_df = df["Desc"].str.split("\n\n", expand=True)
for col in desc_df.columns:
desc_df[col] = desc_df[col].str.split(":").str[1].str.strip()
colnames = "Title", "Body", "respo_URL", "respo_status"
desc_df = desc_df.rename(columns=dict(enumerate(colnames)))
df = pd.concat([df.drop(columns="Desc"), desc_df], axis=1)
Desc
at \n\n
and expand the result into a dataframe desc_df
.:
, take the right side, and strip whitespace.Desc
column and desc_df
.Result for the sample:
sl no key id Sum Title Body respo_URL \
0 661 3484 13592349 [E-1] New_ind Detection_error www.github.com
1 662 3483 13592490 [E-1] New_ind2 import_error
respo_status
0 {yellow}
1 {green}
The following regex-version worked for the sample, but I think it's not as robust the other one:
pattern = "\n\n".join(
f"\*+{col} \*+: (?P<{col}>[^\n]*)"
for col in ("Title", "Body", "respo_URL", "respo_status")
)
desc_df = df["Desc"].str.extract(pattern)
df = pd.concat([df.drop(columns="Desc"), desc_df], axis=1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.