I have a pandas df containing a column composed of text like:
String1::some_text::some_text;String2::some_text::;String3::some_text::some_text;String4::some_text::some_text
I can see that:
I want to create a new column containing:
String1, String2, String3, String4
All separed by a comma but still in the same column.
How to approach the problem?
Thanks for your help
try this:
In [136]: df.txt.str.findall(r'String\d+').str.join(', ')
Out[136]:
0 String1, String2, String3, String4
Name: txt, dtype: object
Data:
In [137]: df
Out[137]:
txt
0 String1::some_text::some_text;String2::some_text::;String3::some_text::some_text;String4::some_t...
Setup:
df = pd.DataFrame({'txt': ['String1::some_text::some_text;String2::some_text::;String3::some_text::some_text;String4::some_text::some_text']})
consider the dataframe df
with column txt
df = pd.DataFrame(['String1::some_text::some_text;String2::some_text::;String3::some_text::some_text;String4::some_text::some_text'] * 10,
columns=['txt'])
df
use a combination of str.split
and groupby
df.txt.str.split(';', expand=True).stack() \
.str.split('::').str[0].groupby(level=0).apply(list)
0 [String1, String2, String3, String4]
1 [String1, String2, String3, String4]
2 [String1, String2, String3, String4]
3 [String1, String2, String3, String4]
4 [String1, String2, String3, String4]
5 [String1, String2, String3, String4]
6 [String1, String2, String3, String4]
7 [String1, String2, String3, String4]
8 [String1, String2, String3, String4]
9 [String1, String2, String3, String4]
dtype: object
I would just apply a lambda function to do the operation you want to do (split first on ";", then split on "::" and keep the first element, and join them back):
df['new_col'] = df['old_col'].apply(lambda s: ", ".join(t.split("::")[0] for t in s.split(";")))
You could also avoid splitting on ::
since simply stopping before the first :
is enough:
df['new_col'] = df['old_col'].apply(lambda s: ", ".join(t[:t.index(":")] for t in s.split(";")))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.