简体   繁体   English

检查ISBN号码是否正确

[英]Checking if an ISBN number is correct

I'm given some ISBN numbers eg 3-528-03851 (not valid) , 3-528-16419-0 (valid). 我得到了一些ISBN号,例如3-528-03851 (无效), 3-528-16419-0 (有效)。 I'm supposed to write a program which tests if the ISBN number is valid. 我应该写一个程序来测试ISBN号是否有效。

Here' my code: 这是我的代码:

def check(isbn):
    check_digit = int(isbn[-1])
    match = re.search(r'(\d)-(\d{3})-(\d{5})', isbn[:-1])

    if match:
        digits = match.group(1) + match.group(2) + match.group(3)
        result = 0

        for i, digit in enumerate(digits):
          result += (i + 1) * int(digit)

        return True if (result % 11) == check_digit else False

    return False

I've used a regular expression to check a) if the format is valid and b) extract the digits in the ISBN string. 我使用了一个正则表达式来检查a)格式是否有效以及b)提取ISBN字符串中的数字。 While it seems to work, being a Python beginner I'm eager to know how I could improve my code. 虽然它似乎有效,但作为Python初学者,我很想知道如何改善我的代码。 Suggestions? 有什么建议吗?

First, try to avoid code like this: 首先,尝试避免这样的代码:

if Action():
    lots of code
    return True
return False

Flip it around, so the bulk of code isn't nested. 翻转它,这样就不会嵌套大量代码。 This gives us: 这给我们:

def check(isbn):
    check_digit = int(isbn[-1])
    match = re.search(r'(\d)-(\d{3})-(\d{5})', isbn[:-1])

    if not match:
        return False

    digits = match.group(1) + match.group(2) + match.group(3)
    result = 0

    for i, digit in enumerate(digits):
      result += (i + 1) * int(digit)

    return True if (result % 11) == check_digit else False

There are some bugs in the code: 代码中有一些错误:

  • If the check digit isn't an integer, this will raise ValueError instead of returning False: "0-123-12345-Q". 如果校验位不是整数,则将引发ValueError而不是返回False:“ 0-123-12345-Q”。
  • If the check digit is 10 ("X"), this will raise ValueError instead of returning True. 如果校验位是10(“ X”),这将引发ValueError而不是返回True。
  • This assumes that the ISBN is always grouped as "1-123-12345-1". 假定ISBN始终分组为“ 1-123-12345-1”。 That's not the case; 事实并非如此; ISBNs are grouped arbitrarily. ISBN任意分组。 For example, the grouping "12-12345-12-1" is valid. 例如,分组“ 12-12345-12-1”有效。 See http://www.isbn.org/standards/home/isbn/international/html/usm4.htm . 参见http://www.isbn.org/standards/home/isbn/international/html/usm4.htm
  • This assumes the ISBN is grouped by hyphens. 假定ISBN按连字符分组。 Spaces are also valid. 空格也是有效的。
  • It doesn't check that there are no extra characters; 它不会检查是否没有多余的字符。 '0-123-4567819' returns True, ignoring the extra 1 at the end. '0-123-4567819'返回True,最后忽略多余的1。

So, let's simplify this. 因此,让我们简化一下。 First, remove all spaces and hyphens, and make sure the regex matches the whole line by bracing it in '^...$'. 首先,删除所有空格和连字符,并通过将其放在'^ ... $'中来确保正则表达式与整行匹配。 That makes sure it rejects strings which are too long. 这样可以确保它拒绝太长的字符串。

def check(isbn):
    isbn = isbn.replace("-", "").replace(" ", "");
    check_digit = int(isbn[-1])
    match = re.search(r'^(\d{9})$', isbn[:-1])
    if not match:
        return False

    digits = match.group(1)

    result = 0
    for i, digit in enumerate(digits):
      result += (i + 1) * int(digit)

    return True if (result % 11) == check_digit else False

Next, let's fix the "X" check digit problem. 接下来,让我们解决“ X”校验位问题。 Match the check digit in the regex as well, so the entire string is validated by the regex, then convert the check digit correctly. 还要匹配正则表达式中的校验位,因此整个字符串都由正则表达式验证,然后正确转换校验位。

def check(isbn):
    isbn = isbn.replace("-", "").replace(" ", "").upper();
    match = re.search(r'^(\d{9})(\d|X)$', isbn)
    if not match:
        return False

    digits = match.group(1)
    check_digit = 10 if match.group(2) == 'X' else int(match.group(2))

    result = 0
    for i, digit in enumerate(digits):
      result += (i + 1) * int(digit)

    return True if (result % 11) == check_digit else False

Finally, using a generator expression and max is a more idiomatic way of doing the final calculation in Python, and the final conditional can be simplified. 最后,使用生成器表达式和max是在Python中进行最终计算的更惯用的方法,并且可以简化最终条件。

def check(isbn):
    isbn = isbn.replace("-", "").replace(" ", "").upper();
    match = re.search(r'^(\d{9})(\d|X)$', isbn)
    if not match:
        return False

    digits = match.group(1)
    check_digit = 10 if match.group(2) == 'X' else int(match.group(2))

    result = sum((i + 1) * int(digit) for i, digit in enumerate(digits))
    return (result % 11) == check_digit

毫无意义的改进:更换return True if (result % 11) == check_digit else Falsereturn (result % 11) == check_digit

check this after you have finished ok :) 完成后检查此:)

http://www.staff.ncl.ac.uk/djwilkinson/software/isbn.py http://www.staff.ncl.ac.uk/djwilkinson/software/isbn.py

and

http://chrisrbennett.com/2006/11/isbn-check-methods.html http://chrisrbennett.com/2006/11/isbn-check-methods.html

EDIT : Sorry about the confusing i didn't see the homework tag but maybe after finishing your homework you can see what other have done before, i think you can learn a lot from others code ; 编辑:对不起,我没有看到作业标签,但是也许在完成作业之后,您可以看到其他人之前所做的事情,我认为您可以从其他代码中学到很多东西; sorry again :( 再次抱歉 :(

  • The check_digit initialization can raise a ValueError if the last character isn't a decimal digit. 如果最后一个字符不是十进制数字,则check_digit初始化会引发ValueError Why not pull out the check digit with your regex instead of using slicing? 为什么不使用正则表达式提取校验位而不使用切片呢?
  • Instead of search, you should probably use match, unless you want to allow arbitrary junk as the prefix. 除了搜索之外,您可能应该使用match,除非您想允许任意垃圾作为前缀。 (Also, as a rule of thumb I'd anchor the end with $ , though in your case that won't matter as your regex is fixed-width.) (此外,根据经验,我会在$的末尾锚定,尽管在您的情况下这并不重要,因为您的正则表达式为固定宽度。)
  • Instead of manually listing the groups, you could just use ''.join(match.groups()) , and pull the check_digit out afterwards. 除了手动列出组外,您还可以使用''.join(match.groups()) ,然后将check_digit拉出。 You might as well do the conversion to int s before pulling it out, as you want to convert all of them to int s anyway. 您最好在将其转换为int之前将其转换为int ,因为无论如何要将它们都转换为int
  • your for loop could be replaced by a list/generator comprehension. 您的for循环可以被列表/生成器理解所代替。 Just use sum() to add up the elements. 只需使用sum()将元素sum()即可。
  • True if (expression) else False can generally be replaced with simply expression . True if (expression) else False ,通常可以用简单的expression代替True if (expression) else False Likewise, False if (expression) else True can always be replaced with simply not expression 同样, False if (expression) else True始终可以用简单的not expression代替

Putting that all together: 放在一起:

def check(isbn):
    match = re.match(r'(\d)-(\d{3})-(\d{5})-(\d)$', isbn)
    if match:
        digits = [int(x) for x in ''.join(match.groups())]
        check_digit = digits.pop()
        return check_digit == sum([(i + 1) * digit
                                  for i, digit in enumerate(digits)]) % 11
    return False

The last line is arguably unnecessary, as the default behavior would be to return None (which is falsy), but explicit returns from some paths and not from others looks like a bug to me, so I think it's more readable to leave it in. 最后一行可以说是不必要的,因为默认行为是返回None(这是虚假的),但是从某些路径而不是从其他路径显式返回对我来说似乎是一个错误,因此我认为将其保留在其中更具可读性。

All that regex stuff is great if you belong to the isbn.org compliance inspectorate. 如果您属于isbn.org合规性检查机构,那么所有这些正则表达式都是不错的选择。

However, if you want to know if what the potential customers type into their browser is worth pushing into a query of your database of books for sale, you don't want all that nice red uniform caper. 但是,如果您想知道潜在客户在浏览器中键入的内容是否值得对您的待售图书数据库进行查询,那么您就不需要所有这些漂亮的红色统一标题。 Simply throw away everything but 0-9 and X ... oh yeah nobody uses the shift key so we'd better allow x as well. 只需丢弃除0-9和X以外的所有内容...哦,是的,没有人使用Shift键,因此我们最好也允许使用x。 Then if it's length 10 and passes the check-digit test, it's worth doing the query. 然后,如果长度为10并通过校验数字测试,则值得进行查询。

From http://www.isbn.org/standards/home/isbn/international/html/usm4.htm 来自http://www.isbn.org/standards/home/isbn/international/html/usm4.htm

The check digit is the last digit of an ISBN. 校验位是ISBN的最后一位。 It is calculated on a modulus 11 with weights 10-2, using X in lieu of 10 where ten would occur as a check digit. 它是使用权重10-2的模数11计算的,使用X代替10,其中十将作为校验位。

This means that each of the first nine digits of the ISBN -- excluding the check digit itself -- is multiplied by a number ranging from 10 to 2 and that the resulting sum of the products, plus the check digit, must be divisible by 11 without a remainder. 这意味着ISBN的前九个数字中的每个数字(不包括校验位本身)都将乘以10到2之间的数字,并且乘积的总和加上校验位必须被11整除没有剩余。

which is a very long-winded way of saying "each of all the digits is multiplied by a number ranging from 10 to 1 and that the resulting sum of the products must be divisible by 11 without a remainder" 这是长篇大论的说法:“所有数字中的每一个都乘以10到1的数字,并且乘积的结果之和必须被11除尽,而不能有余数”

def isbn10_ok(s):
    data = [c for c in s if c in '0123456789Xx']
    if len(data) != 10: return False
    if data[-1] in 'Xx': data[-1] = 10
    try:
        return not sum((10 - i) * int(x) for i, x in enumerate(data)) % 11
    except ValueError:
        # rare case: 'X' or 'x' in first 9 "digits"
        return False


tests = """\
    3-528-03851
    3-528-16419-0
    ISBN 0-8436-1072-7
    0864425244
    1864425244
    0864X25244
    1 904310 16 8
    0-473-07480-x
    0-473-07480-X
    0-473-07480-9
    0-473-07480-0
    123456789
    12345678901
    1234567890
    0000000000
    """.splitlines()

for test in tests:
    test = test.strip()
    print repr(test), isbn10_ok(test)

Output: 输出:

'3-528-03851' False
'3-528-16419-0' True
'ISBN 0-8436-1072-7' True
'0864425244' True
'1864425244' False
'0864X25244' False
'1 904310 16 8' True
'0-473-07480-x' True
'0-473-07480-X' True
'0-473-07480-9' False
'0-473-07480-0' False
'123456789' False
'12345678901' False
'1234567890' False
'0000000000' True
'' False

Aside: a large well-known bookselling site will accept 047307480x, 047307480X, and 0-473-07480-X but not 0-473-07480-x :-O 撇开:一个大型的知名书店将接受047307480x,047307480X和0-473-07480-X,但不接受0-473-07480-x :-O

Your code is nice -- well done for writing idiomatic Python! 您的代码很好-编写惯用的Python很好! Here are some minor things: 以下是一些小事:


When you see the idiom 当你看到这个成语

result = <initiator>
for elt in <iterable>:
    result += elt

you can replace it by a list comprehension. 您可以将其替换为列表理解。 In this case: 在这种情况下:

result = sum((i+1)*int(digit) for i, digit in enumerate(digits)

or even more concisely: 或更简而言之:

return sum((i+1)*int(digit) for i, digit in enumerate(digits) % 11 == check_digit

Of course, it is a value judgement whether this is better than the original. 当然,这是否比原始版本更好是一个价值判断。 I would personally consider the second of these to be best. 我个人认为其中的第二个最好。

Also, the extra parentheses in (result % 11) == check_digit are extraneous and I don't really think you need them for clarity. 另外, (result % 11) == check_digit中的多余括号是多余的,我并不是真的为了清楚起见不需要它们。 That leaves you overall with: 这使您总体上具有:

def validate(isbn):
    check_digit = int(isbn[-1])
    match = re.search(r'(\d)-(\d{3})-(\d{5})', isbn[:-1])

    if match:
        digits = match.group(1) + match.group(2) + match.group(3)
        parity = sum((i+1)*int(digit) for i, digit in enumerate(digits)
        return parity % 11 == check_digit
    else:
        return False

Note that you do still need the return False to catch the case that the ISBN is not even in the right format. 请注意,您仍然需要return False来捕获ISBN甚至不是正确格式的情况。

Don't forget (though this may be outside of the scope of your assignment) to calculate the check digit of the ISBN (the final digit), to determine if the ISBN is valid and not just seemingly valid . 不要忘记(尽管这可能不在您的分配范围之内)以计算ISBN的校验位(最后一位),确定ISBN是否有效 ,而不仅仅是看似有效

There's some information about the implementation of the check digit on the ISBN.org website , and implementation should be fairly straightforward. ISBN.org网站上有一些有关校验位的实现的信息,并且实现应该非常简单。 Wikipedia offers one such example (presuming you've already converted any ASCII "X" to a decimal 10): Wikipedia提供了一个这样的示例(假设您已经将任何ASCII“ X”转换为十进制10):

bool is_isbn_valid(char digits[10]) {
    int i, a = 0, b = 0;
    for (i = 0; i < 10; i++) {
        a += digits[i];  // Assumed already converted from ASCII to 0..10
        b += a;
    }
    return b % 11 == 0;
}

Applying this for your assignment is left, well, as an exercise for you. 剩下的就是将其应用到您的作业中,作为练习。

Your check digit can take on the values 0-10, based on the fact that it's modulo-11. 您的校验位可以取模数为11的值,取值范围为0-10。 There's a problem with the line: 这条线有问题:

    check_digit = int(isbn[-1]) 

as this works only for the digits 0-9. 因为这仅适用于数字0-9。 You'll need something for the case when the digit is 'X', and also for the error condition when it isn't any of the above - otherwise your program will crash. 当数字为“ X”时,您将需要一些东西,而当数字不是上述任何一种时,您将需要为错误情况-否则您的程序将崩溃。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM