多行正則表達式：如何在熊貓數據框中的日期之間提取文本？

Question

我有帶描述列的數據框，在一行描述下有多行文本，基本上這些是每條記錄的信息集。

示例：關於信息 1 在 07-01-2019 我們得到更新，因為天空是藍色的，在 05-22-2019 我們再次得到更新，因為蘋果是紅色的，排列在兩個日期之間。 首先，我想提取日期之間的文本，並將新列中的各個詳細信息拆分為日期、名稱、描述。

原始描述看起來像

info no|           Description
--------------------------------------------------------------------------
1      |07-01-2019 12:59:41 - XYZ (Work notes) The sky is blue in color.
       |                                        Clouds are looking lovely.
       | 05-22-2019 12:00:49 - MNX  (Work notes) Apples are red in color.
--------------------------------------------------------------------------    
       |  02-26-2019 12:53:18 - ABC (Work notes) Task is to separate balls.
2      |  02-25-2019 16:57:57 - lMN (Work notes) He came by train.
       |                                         That train was 15 min late.
       |                                         He missed the concert.
       |  02-25-2019 11:08:01 - sbc (Work notes) She is my grandmother.

期望的輸出是

info No |DATE                   |  NAME |   DESCRIPTION
--------|------------------------------------------------------
   1    |07-01-2019 12:59:41    |   xyz  |  The sky is blue in color.
        |                       |        |  Clouds are looking lovely.
--------|---------------------------------------------------------
   1    |05-22-2019 12:00:49    |   MNX  |  Apples are red in color                     
--------|---------------------------------------------------------
   2    | 02-26-2019 12:53:18   |   ABC  |  Task is to separate blue balls.
--------|---------------------------------------------------------
   2    |  02-25-2019 16:57:57  |   IMN   |  He came by train
        |                       |         |  That train was 15 min late.
        |                       |         |  He missed the concert.
--------|---------------------------------------------------------
        |  02-25-2019 11:08:01  |   sbc   | She is my grandmother.

我試過：

 myDf = pd.DataFrame(re.split('(\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2} -.*)',Description),columns = ['date'])
 myDf['date'] = myDf['date'].replace('(Work notes)','-', regex=True)
 newQueue = newQueue.date.str.split(-,n=3)

Answer 1

有這個數據框

df
                                             Description
Sl No                                                   
1      07-01-2019 12:59:41 - XYZ (Work notes) The sky...
2      05-22-2019 12:00:49 - MNX  (Work notes) Apples...
3      02-26-2019 12:53:18 - ABC (Work notes) Task is...
4      02-25-2019 16:57:57 - lMN (Work notes) He came...
5      02-25-2019 11:08:01 - sbc (Work notes) She is ...

您可以通過“（工作筆記）”拆分描述列中的字符串，然后可以使用 values.tolist 將其拆分為 2 列，如下所示：

x['Description']=x['Description'].apply(lambda x: x.split('(Work notes)'))

x=pd.DataFrame(x['Description'].values.tolist(), index= x.index)

print(x)

                                 0                            1
Sl No                                                          
1       07-01-2019 12:59:41 - XYZ     The sky is blue in color.
2      05-22-2019 12:00:49 - MNX       Apples are red in color.
3       02-26-2019 12:53:18 - ABC    Task is to separate balls.
4       02-25-2019 16:57:57 - lMN             He came by train.
5       02-25-2019 11:08:01 - sbc        She is my grandmother.

多行正則表達式：如何在熊貓數據框中的日期之間提取文本？

問題描述

1 個解決方案

解決方案1
0 2019-08-30 11:12:59

多行正則表達式：如何在熊貓數據框中的日期之間提取文本？

問題描述

1 個解決方案

解決方案1 0 2019-08-30 11:12:59

解決方案1
0 2019-08-30 11:12:59