[英]How to extract multiple patterns from a huge file, with repeating data blocks?
[英]how to extract all repeating patterns from a string into a dataframe
我有一個帶有某些卡車設備代碼的 dataframe,這是一個類似的單元格列表
x = [[A0B,A1C,A1Z,A2E,A5C,B1B,B1F,B1H,B2A],
[A0A,A0B,A1C,A1Z,A2I,A5L,B1B,B1F,B1H,B2A,B2X,B3H,B4L,B5E,B5J,C0G,C1W,C5B,C5D],
[A0B,A1C,A1Z,A2E,A5C,B1B,B1F,B1H,B2A,B2X,B4L,B5C,B5I,C0A,C1J,C5B,C5D,C6C,C6J,C6Q]]
我想提取與 "B" 匹配的所有值,例如 ("B1B,B1F,B1H");("B1B,B1F,B1H,B2A,B2X,B3H")("B1B,B1F,B1H,B2A, B2X,B4L,B5C,B5I") 我試試這個代碼,但每一行每一行都有不同的長度 sublista = ['B1B','B1F','B1H','B2A','B2X','B4L',' B5C','B5I']
df3 = pd.DataFrame(columns=['FIN', 'Equipmentcodes', 'AQUATARDER', 'CAJA'])
for elemento in sublista:
df_aux=(df2[df2['Equipmentcodes'].str.contains(elemento, case=False)])
df_aux['CAJA'] = elemento
df3 = df3.append(df_aux, ignore_index=True)
假設您的列包含字符串,您可以使用正則表達式:
df['selected'] = (df['code']
.str.extractall(r'\b(B[^,]*)\b')[0]
.groupby(level=0).apply(','.join)
)
示例輸入:
x = ['A0B,A1C,A1Z,A2E,A5C,B1B,B1F,B1H,B2A',
'A0A,A0B,A1C,A1Z,A2I,A5L,B1B,B1F,B1H,B2A,B2X,B3H,B4L,B5E,B5J,C0G,C1W,C5B,C5D',
'A0B,A1C,A1Z,A2E,A5C,B1B,B1F,B1H,B2A,B2X,B4L,B5C,B5I,C0A,C1J,C5B,C5D,C6C,C6J,C6Q']
df = pd.DataFrame({'code': x})
output:
selected code
0 B1B,B1F,B1H,B2A A0B,A1C,A1Z,A2E,A5C,B1B,B1F,B1H,B2A
1 B1B,B1F,B1H,B2A,B2X,B3H,B4L,B5E,B5J A0A,A0B,A1C,A1Z,A2I,A5L,B1B,B1F,B1H,B2A,B2X,B3H,B4L,B5E,B5J,C0G,C1W,C5B,C5D
2 B1B,B1F,B1H,B2A,B2X,B4L,B5C,B5I A0B,A1C,A1Z,A2E,A5C,B1B,B1F,B1H,B2A,B2X,B4L,B5C,B5I,C0A,C1J,C5B,C5D,C6C,C6J,C6Q
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.