使用python从文本列中提取有意义的信息

Question

I have a table with the two columns. 我有一个包含两列的表格。 I have to extract the meaningful information out of it from the Notes column. 我必须从“注释”列中提取出有意义的信息。 ie I need to extract the date in one column and the information after the date in one column and then ID 即我需要将日期提取到一栏中，并将日期之后的信息提取到一栏中，然后提取ID

Notes, ID
Movie Date 05-28-2018 Passed, 1010
MTD loan slip dated 8-10-14 the Issued, 1111
Max over date 10-2-15 and repaired, 11232

output- 输出-

Notes                               ID      Date        Status
Movie Date 05-28-2018 Passed        1010    5/28/2018   Passed
loan slip dated 8-10-14 Issued      1111    8/10/2014   Issued
Max over date 10-2-15 and repaired  11232   10/2/2015   repaired

Here is my code- 这是我的代码-

df = pd.read_sql('select * from <table>', engine)
searchfor = [' dated', ' date', ' Date', ' Dated']
df2 = df[df['Notes'] .str.contains('|'.join(searchfor), na = False)]
..................

Appreciate your help on this. 感谢您的帮助。 Thank you. 谢谢。

Answer 1

I would some some loops for that. 我会为此做一些循环。

Example : 范例：

import pandas as pd

df = pd.read_csv("data.csv")

searchforstatus = [' Passed', ' Issued', ' repaired']

for idx, row in df.iterrows():
    for c in searchforstatus:
        if c in row['Notes']:
            df.loc[idx, 'Status'] = c

Result 结果

                                    Notes     ID     Status
0            Movie Date 05-28-2018 Passed   1010     Passed
1  MTD loan slip dated 8-10-14 the Issued   1111     Issued
2      Max over date 10-2-15 and repaired  11232   repaired

The data that I used can be found here: https://files.fm/u/npaceyd6#_ 我使用的数据可以在这里找到： https : //files.fm/u/npaceyd6#_

Answer 2

Regex after getting the rows from iterrows() can also extract information, if there can be many possibilities 从iterrows（）获取行后的正则表达式也可以提取信息（如果可能的话）

  s = 'Movie Date 05-28-2018 Passed'
  p = re.search(r'Dated?\s(\d+-\d+-\d+)\s([a-zA-Z]+)',s)

p.group(1) will have the date value and p.group(2) will have the value 'Passed'. p.group（1）将具有日期值，p.group（2）将具有“已通过”值。 Hope this helps.. 希望这可以帮助..

使用python从文本列中提取有意义的信息

问题描述

2 个解决方案

解决方案1
0 2018-05-23 17:48:59

解决方案2
0 2018-05-23 18:55:33

使用python从文本列中提取有意义的信息

问题描述

2 个解决方案

解决方案1 0 2018-05-23 17:48:59

解决方案2 0 2018-05-23 18:55:33

解决方案1
0 2018-05-23 17:48:59

解决方案2
0 2018-05-23 18:55:33