使用python从字符串中提取模式

Question

我正在尝试从Excel文件中的列读取数据，然后在该行中使用多余的用户ID。 到目前为止，我已经能够使用以下代码提取用户ID，然后将结果写入Excel文件。

import xlrd
import pandas as pd


#Input File Path
file='file1.xlsx'
workbook = xlrd.open_workbook(file)

#open first worksheet
sheet=workbook.sheet_by_index(0)

#extract details from 4th column
description = sheet.col_values(4)

my_series = pd.Series(description)
numbers = my_series.str.findall('\d+')
All_Ids = pd.to_numeric(numbers, errors='ignore')
All_Ids_mapped = [map(int, x) for x in All_Ids]
df = pd.DataFrame(All_Ids_mapped)

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('extracted_ids.xlsx', engine='xlsxwriter')

# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')

# Close the Pandas Excel writer and output the Excel file.
writer.save()

但是现在我的问题是，该列中有许多ID。 所以我想提取以字符串'user with id'开头的id，例如列中的字符串如下所示：

The user with id '123' started discussion with the user with id '456' in the discussion thread with id '5000'.

由于我只对用户ID感兴趣，因此我想更新搜索字符串以合并文本。 我尝试了以下操作，但没有给出输出。

  numbers=my_series.str.findall('^user with id.+\d+')

请帮助我在str.findall编写正确的表达式。

谢谢。

Answer 1

使用re模块，我得到以下结果：

series = "The user with id '123' started discussion with the user with id '456' in the discussion thread with id '5000'."
>>>re.findall("user with id '\d+'", series)
["user with id '123'", "user with id '456'"]

这些是预期的比赛吗？ 由于结果匹配是有序的，因此按索引选择一个并提取ID并不难。

使用python从字符串中提取模式

问题描述

1 个解决方案

解决方案1
0 2017-04-28 01:28:58

使用python从字符串中提取模式

问题描述

1 个解决方案

解决方案1 0 2017-04-28 01:28:58

解决方案1
0 2017-04-28 01:28:58