[英]How to extract a part of string match with specific pattern but for all cases in a row and separate them by comma using pandas
I've a pandas dataframe like below df including some string with different row size.我有一个像下面的 df 的熊猫数据框,包括一些不同行大小的字符串。 i need just a part of this string with format match with
[AZ]{1}[0-9]{4}
like T7114.我只需要此字符串的一部分,格式与
[AZ]{1}[0-9]{4}
匹配,例如 T7114。
NODE
====================================================
T7114 (Shahrekord)
T7374 (Esfahan - Shahrekord Rd.), T7114 , T7113
T8319 (HOUMEH Shahrekord), E1826 (Shahrekord)
E1577 (Shahrekord), T7114 (Shahrekord), T7941 (KIAN)
T8319 (HOUMEH Shahrekord), T7941 (KIAN)
T7941 (KIAN), T7114 (Shahrekord)
how can i extract just this part from all the string each row and put them in inside each other by seperating using comma like below df?我怎样才能从每一行的所有字符串中提取这一部分,并通过使用如下 df 的逗号分隔将它们放在彼此内部?
NODE NE
============================================ ============================
T7114 (Shahrekord) T7114
T7374 (Esfahan - Shahrekord Rd.),T7114,T7113 T7374,T7114,T7113
T8319 (HOUMEH Shahrekord), E1826 (Shahrekord) T8319,E1826
E1577 (Shahrekord), T7114 (Shahrekord), T7941 (KIAN) E1577,T7114,T7941
T8319 (HOUMEH Shahrekord), T7941 (KIAN) T8319,T7114
T7941 (KIAN), T7114 (Shahrekord) T7941,T7114
i tried to extract it using a regex with extract and strip function like below but it is only extracting the first match string while i want to extract all each row and separate them using comma.我尝试使用带有提取和剥离功能的正则表达式来提取它,如下所示,但它仅提取第一个匹配字符串,而我想提取所有每一行并使用逗号分隔它们。 what is the most efficient way to do this?
最有效的方法是什么?
df['NODE'] = df['NODE'].str.extract('([A-Z{1}0-9{4} ]+)', expand=False).str.strip()
Try this,尝试这个,
df["NODE"].str.findall("\w\d+").str.join(",")
0 T7114
1 T7374,T7114,T7113
2 T8319,E1826
3 E1577,T7114,T7941
4 T8319,T7941
5 T7941,T7114
Name: NODE, dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.