如何修复 ValueError：int() 的无效文字，基数为 10：''？

Question

i am using a python script with regex module trying to process 2 files and create a final output as required but getting some errors.我正在使用带有正则表达式模块的 python 脚本尝试处理 2 个文件并根据需要创建最终的 output 但出现一些错误。

cat links.txt猫链接.txt

https://videos-a.jwpsrv.com/content/conversions/7kHOkkQa/videos/XXXXJD8C-32313922.mp4.m3u8?hdnts=exp=1596554537~acl=*/bGxpJD8C-32313922.mp4.m3u8~hmac=2ac95222f1693d11e7fd8758eb0a18d6d2ee187bb10e3c27311e627785687bd5
https://videos-a.jwpsrv.com/content/conversions/7kHOkkQa/videos/XXXXkxI1-32313922.mp4.m3u8?hdnts=exp=1596554733~acl=*/bM07kxI1-32313922.mp4.m3u8~hmac=dd0fc6f433a8ac74c9eaa2a376fa4324a65ae7c410cdcf8e869c6961f1a5b5ea
https://videos-a.jwpsrv.com/content/conversions/7kHOkkQa/videos/XXXXpGKZ-32313922.mp4.m3u8?hdnts=exp=1596554748~acl=*/onhIpGKZ-32313922.mp4.m3u8~hmac=d4030cf7813cef02a58ca17127a0bc6b19dc93cccd6add4edc72a2ee5154f236
https://videos-a.jwpsrv.com/content/conversions/7kHOkkQa/videos/XXXXLbgy-32313922.mp4.m3u8?hdnts=exp=1596554871~acl=*/xGXCLbgy-32313922.mp4.m3u8~hmac=7c515306c033c88d32072d54ba1d6aa4abf1be23070d1bb14d1311e4e74cc1d7

cat name.txt猫名.txt

Introduction Lecture 1
Questions Lecture 1B
Theory Lecture 2
Labour Costing Lecture 352 (Classroom Lecture)

Expected ( final.txt )预期 (final.txt)

https://cdn.jwplayer.com/vidoes/XXXXJD8C-32313922.mp4
  out=Lecture 001- Introduction.mp4
https://cdn.jwplayer.com/vidoes/XXXXkxI1-32313922.mp4
  out=Lecture 001B- Questions.mp4
https://cdn.jwplayer.com/vidoes/XXXXpGKZ-32313922.mp4
  out=Lecture 002- Theory.mp4
https://cdn.jwplayer.com/vidoes/XXXXLbgy-32313922.mp4
  out=Lecture 352- Labour Costing (Classroom Lecture).mp4

cat sort.py ( my existing script ) cat sort.py（我现有的脚本）

import re

final = open('final.txt','w')
a = open('links.txt','r')
b = open('name.txt','r')
base = 'https://cdn.jwplayer.com/videos/'
kek = re.compile(r'(?<=\/)[\w\-\.]+(?=.m3u8)')
# find max lecture number
n = None
for line in b:
    b_n = int(''.join([c for c in line.rpartition(' ')[2] if c in '1234567890']))
    if n is None or b_n > n:
        n = b_n
n = len(str(n))  # string len of the max lecture number
    
b = open('name.txt','r')
for line in a:
    final.write(base + kek.search(line).group() + '\n')
    b_line = b.readline().rstrip()
    line_before_lecture, _, lecture = b_line.partition('Lecture')
    line_before_lecture = line_before_lecture.strip()
    lecture_no = lecture.rpartition(' ')[2]
    lecture_str = lecture_no.rjust(n, '0') + '-' + " " + line_before_lecture
    final.write('  out=' + 'Lecture ' + lecture_str + '.mp4\n')

Traceback追溯

Traceback (most recent call last):
  File "sort.py", line 11, in <module>
    b_n = int(''.join([c for c in line.rpartition(' ')[2] if c in '1234567890']))
ValueError: invalid literal for int() with base 10: ''

Edit - It seems that the error is due to the last line in name.txt as my script assumes all lines in name.txt would end in format of Lecture X.编辑- 错误似乎是由于 name.txt 中的最后一行，因为我的脚本假定 name.txt 中的所有行都将以 Lecture X 的格式结束。

One way to fix it i guess is to edit the script and add a if condition as follows:我想解决它的一种方法是编辑脚本并添加一个if条件，如下所示：

If any line in name.txt doesn't end in format - Lecture X, then shift the text succeeding Lecture X prior to word Lecture.如果 name.txt 中的任何行不以格式 - Lecture X 结尾，则将 Lecture X 之后的文本移动到单词 Lecture 之前。

Example the 4th line of name.txt Labour Costing Lecture 352 (Classroom Lecture) Could be converted to Labour Costing (Classroom Lecture) Lecture 352 and edit the below line in my script to match only the last occurrence of "Lecture" in a line in name.txt示例 name.txt 的第 4 行Labour Costing Lecture 352 (Classroom Lecture)可以转换为Labour Costing (Classroom Lecture) Lecture 352并在我的脚本中编辑以下行以仅匹配最后一次出现的“讲座”在名称.txt

line_before_lecture, _, lecture = b_line.partition('Lecture')

i basically need the expected output ( final.txt ) from those 2 files ( names.txt and links.txt ) using the script, if there's a better/smart way to do it, i would definitely be happy to use it.我基本上需要使用脚本从这两个文件（ names.txt 和 links.txt ）中获得预期的 output （ final.txt ），如果有更好/更智能的方法，我肯定会很乐意使用它。 I just theoretically suggested one way of doing it which i have no clue how to do it myself我只是理论上建议了一种方法，我不知道自己该怎么做

Answer 1

If you are using regular expressions anyway, why not use them to pull out this information, too?如果您仍然使用正则表达式，为什么不使用它们来提取这些信息呢？

import re

base = 'https://cdn.jwplayer.com/videos/'
kek = re.compile(r'(?<=\/)[\w\-\.]+(?=.m3u8)')
nre = re.compile(r'(.*)\s+Lecture (\d+)(.*)')

with open('name.txt') as b:
  lecture = []
  for line in b:
    parsed = nre.match(line)
    if parsed:
      lecture.append((int(parsed.group(2)), parsed.group(3), parsed.group(1)))
    else:
      raise ValueError('Unable to parse %r' % line)

n = len(str(lecture[-1][0]))

with open('links.txt','r') as a:
  for idx, line in enumerate(a):
    print(base + kek.search(line).group())
    fmt='  out=Lecture {0:0' + str(n) + 'n}{1}- {2}.mp4'
    print(fmt.format(*lecture[idx]))

This only traverses the contents in name.txt once, and stores the results in a variable lecture which contains a tuple of the pieces we pulled out (number, suffix, title).这只会遍历name.txt中的内容一次，并将结果存储在一个变量lecture中，其中包含我们提取的片段的元组（编号、后缀、标题）。

I also changed this to write to standard output;我还将其更改为写入标准 output； redirect to a file if you like, or switch back to explicitly hard-coding the output file in the script itself.如果您愿意，可以重定向到文件，或者切换回在脚本本身中显式硬编码 output 文件。

The splat syntax *lecture is just a shorthand to avoid having to write lecture[0], lecture[1], lecture[2] explicitly. splat 语法*lecture只是一个简写，以避免必须显式地编写lecture[0], lecture[1], lecture[2] 。

Demo: https://repl.it/repls/TatteredInexperiencedFibonacci#main.py演示： https://repl.it/repls/TatteredInexperiencedFibonacci#main.py

Answer 2

The issue is with the last line of cat names.txt.问题在于 cat names.txt 的最后一行。

>>> line = "Labour Costing Lecture 352 (Classroom Lecture)"
>>> [c for c in line.rpartition(' ')[2]]
['L', 'e', 'c', 't', 'u', 'r', 'e', ')']

Clearly not what you are intending to extract.显然不是您要提取的内容。 Since none of these is a number, it returns an empty string which cannot be cast to an int.由于这些都不是数字，因此它返回一个不能转换为 int 的空字符串。 If you are looking to extract the int, I would suggest looking at this question: How to extract numbers from a string in Python?如果您要提取 int，我建议您查看以下问题： How to extract numbers from a string in Python?

如何修复 ValueError：int() 的无效文字，基数为 10：''？

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-08-13 09:41:44

解决方案2
0 2020-08-12 17:31:21

如何修复 ValueError：int() 的无效文字，基数为 10：''？

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-08-13 09:41:44

解决方案2 0 2020-08-12 17:31:21

解决方案1
1 已采纳 2020-08-13 09:41:44

解决方案2
0 2020-08-12 17:31:21