Python，从Excel列中提取数字并写为输出

Question

Trying to extract the number from columns in an Excel file, and write them into the next columns. 尝试从Excel文件中的列中提取数字，然后将其写入下一列。

Matching criteria: any number of length five, either started with “PB” or not 匹配条件：长度为5的任意数目，是否以“ PB”开头

I've limited the length of the number match to five however there are a “16” extracted (row#2, column D) 我将数字匹配的长度限制为五个，但是提取了“ 16”（第2行，D列）

How I can improve it? 我该如何改善？ Thank you. 谢谢。

import xlwt, xlrd, re
from xlutils.copy import copy 

workbook = xlrd.open_workbook("C:\\Documents\\num.xlsx")
old_sheet = workbook.sheet_by_name("Sheet1")

wb = copy(workbook) 
sheet = wb.get_sheet(0)

number_of_ships = old_sheet.nrows

for row_index in range(0, old_sheet.nrows):

    Column_a = old_sheet.cell(row_index, 0).value   
    Column_b = old_sheet.cell(row_index, 1).value

    a_b = Column_a + Column_b

    found_PB = re.findall(r"[PB]+(\d{5})", a_b, re.I)
    list_of_numbers = re.findall(r'\d+', a_b)

    for f in found_PB:
        if len(f) == 5:
            sheet.write(row_index, 2, "";"".join(found_PB))

    for l in list_of_numbers:
        if len(l) == 5:
            sheet.write(row_index, 3, "";"".join(list_of_numbers))

wb.save("C:\\Documents\\num-1.xls")

Answer 1

Your \\d+ pattern matches any 1 or more digits, thus the 16 value is matched. 您的\\d+模式匹配任意1个或多个数字，因此16值匹配。 Your [PB]+ character class matches either P or B one or more times, so it restricts the digits to be preceded with either P or B . 您的[PB]+字符类与P或B匹配一次或多次，因此它限制了数字以P或B 。 As you want to match any digits, you actually do not need that restriction (if an A can be preceded with something optionally , the restriction no longer makes sense). 当您要匹配任何数字时，实际上并不需要该限制（如果A可以在前面加上可选的内容 ，则该限制不再有意义）。

You also seem to need to extract 5 digit string exactly, when no other digits precedes or follows them. 您似乎还需要准确地提取5位数字的字符串，而没有其他数字在它们之前或之后。 You may do that with (?<!\\d)\\d{5}(?!\\d) . 您可以使用(?<!\\d)\\d{5}(?!\\d)来做到这一点。 The (?<!\\d) negative lookbehind makes sure there is no digit immediately to the left of the current location, \\d{5} consumes 5 digits, and the (?!\\d) negative lookahead makes sure there is no digit immediately to the right of the current location. 后面的(?<!\\d)负数确保当前位置的左边没有数字， \\d{5}消耗5位数字，并且(?!\\d)负数提前确保没有数字。立即位于当前位置的右侧。 That makes the if len(l) == 5: line redundant and you may omit the whole part of code related to list_of_numbers . 这使得if len(l) == 5:行成为多余的，您可以省略与list_of_numbers相关的整个代码部分。

So, you may just use 因此，您可以使用

import xlwt, xlrd, re
from xlutils.copy import copy 

workbook = xlrd.open_workbook("C:\\Documents\\num.xlsx")
old_sheet = workbook.sheet_by_name("Sheet1")

wb = copy(workbook) 
sheet = wb.get_sheet(0)

number_of_ships = old_sheet.nrows

for row_index in range(0, old_sheet.nrows):

    Column_a = old_sheet.cell(row_index, 0).value   
    Column_b = old_sheet.cell(row_index, 1).value

    a_b = Column_a + Column_b

    found_PB = re.findall(r"(?<!\d)\d{5}(?!\d)", a_b)

    for f in found_PB:
            sheet.write(row_index, 2, "";"".join(found_PB))

wb.save("C:\\Documents\\num-1.xls")

Answer 2

You may use this: ^(?:PB)?\\d{5}$ 您可以使用： ^(?:PB)?\\d{5}$

Demo 演示版

Explained: 解释：

^           # Begin of line/string
  (?:       # Begin of group
     PB     #   Literal 'PB'
  )         # End of group
  ?         # Make the previous group optional (? means 0 or 1 times)
  \d{5}     # 5 digits
$           # End of line/string

It is important to use the $ , since if you just wrote ^(?:PB)?\\d{5} you would match 6 digit numbers even if you wrote \\d{5} this is because you would match the first five digits and you would stop there, without knowing if there are more digits. 使用$很重要，因为如果您只写了^(?:PB)?\\d{5} ，即使您写了\\d{5}也将匹配6位数字，这是因为您将匹配前五位数字并且您会停在那里，而不知道是否还有更多数字。

If your data may start or end with spaces you may use this instead: ^\\s*(?:PB)?\\d{5}\\s*$ It basically adds \\s* at the beginning and the end of the regex. 如果您的数据可能以空格开头或结尾，则可以改用： ^\\s*(?:PB)?\\d{5}\\s*$它基本上在正则表达式的开头和结尾添加了\\s* 。 \\s* means 0 or more spaces. \\s*表示0个或多个空格。

Python，从Excel列中提取数字并写为输出

问题描述

2 个解决方案

解决方案1
3 已采纳 2018-08-27 08:10:30

解决方案2
1 2018-08-27 07:57:01

Python，从Excel列中提取数字并写为输出

问题描述

2 个解决方案

解决方案1 3 已采纳 2018-08-27 08:10:30

解决方案2 1 2018-08-27 07:57:01

解决方案1
3 已采纳 2018-08-27 08:10:30

解决方案2
1 2018-08-27 07:57:01