如何修复 ValueError：int() 的无效文字，基数为 10：''？

Question

我正在使用带有正则表达式模块的 python 脚本尝试处理 2 个文件并根据需要创建最终的 output 但出现一些错误。

猫链接.txt

https://videos-a.jwpsrv.com/content/conversions/7kHOkkQa/videos/XXXXJD8C-32313922.mp4.m3u8?hdnts=exp=1596554537~acl=*/bGxpJD8C-32313922.mp4.m3u8~hmac=2ac95222f1693d11e7fd8758eb0a18d6d2ee187bb10e3c27311e627785687bd5
https://videos-a.jwpsrv.com/content/conversions/7kHOkkQa/videos/XXXXkxI1-32313922.mp4.m3u8?hdnts=exp=1596554733~acl=*/bM07kxI1-32313922.mp4.m3u8~hmac=dd0fc6f433a8ac74c9eaa2a376fa4324a65ae7c410cdcf8e869c6961f1a5b5ea
https://videos-a.jwpsrv.com/content/conversions/7kHOkkQa/videos/XXXXpGKZ-32313922.mp4.m3u8?hdnts=exp=1596554748~acl=*/onhIpGKZ-32313922.mp4.m3u8~hmac=d4030cf7813cef02a58ca17127a0bc6b19dc93cccd6add4edc72a2ee5154f236
https://videos-a.jwpsrv.com/content/conversions/7kHOkkQa/videos/XXXXLbgy-32313922.mp4.m3u8?hdnts=exp=1596554871~acl=*/xGXCLbgy-32313922.mp4.m3u8~hmac=7c515306c033c88d32072d54ba1d6aa4abf1be23070d1bb14d1311e4e74cc1d7

猫名.txt

Introduction Lecture 1
Questions Lecture 1B
Theory Lecture 2
Labour Costing Lecture 352 (Classroom Lecture)

预期 (final.txt)

https://cdn.jwplayer.com/vidoes/XXXXJD8C-32313922.mp4
  out=Lecture 001- Introduction.mp4
https://cdn.jwplayer.com/vidoes/XXXXkxI1-32313922.mp4
  out=Lecture 001B- Questions.mp4
https://cdn.jwplayer.com/vidoes/XXXXpGKZ-32313922.mp4
  out=Lecture 002- Theory.mp4
https://cdn.jwplayer.com/vidoes/XXXXLbgy-32313922.mp4
  out=Lecture 352- Labour Costing (Classroom Lecture).mp4

cat sort.py（我现有的脚本）

import re

final = open('final.txt','w')
a = open('links.txt','r')
b = open('name.txt','r')
base = 'https://cdn.jwplayer.com/videos/'
kek = re.compile(r'(?<=\/)[\w\-\.]+(?=.m3u8)')
# find max lecture number
n = None
for line in b:
    b_n = int(''.join([c for c in line.rpartition(' ')[2] if c in '1234567890']))
    if n is None or b_n > n:
        n = b_n
n = len(str(n))  # string len of the max lecture number
    
b = open('name.txt','r')
for line in a:
    final.write(base + kek.search(line).group() + '\n')
    b_line = b.readline().rstrip()
    line_before_lecture, _, lecture = b_line.partition('Lecture')
    line_before_lecture = line_before_lecture.strip()
    lecture_no = lecture.rpartition(' ')[2]
    lecture_str = lecture_no.rjust(n, '0') + '-' + " " + line_before_lecture
    final.write('  out=' + 'Lecture ' + lecture_str + '.mp4\n')

追溯

Traceback (most recent call last):
  File "sort.py", line 11, in <module>
    b_n = int(''.join([c for c in line.rpartition(' ')[2] if c in '1234567890']))
ValueError: invalid literal for int() with base 10: ''

编辑- 错误似乎是由于 name.txt 中的最后一行，因为我的脚本假定 name.txt 中的所有行都将以 Lecture X 的格式结束。

我想解决它的一种方法是编辑脚本并添加一个if条件，如下所示：

如果 name.txt 中的任何行不以格式 - Lecture X 结尾，则将 Lecture X 之后的文本移动到单词 Lecture 之前。

示例 name.txt 的第 4 行Labour Costing Lecture 352 (Classroom Lecture)可以转换为Labour Costing (Classroom Lecture) Lecture 352并在我的脚本中编辑以下行以仅匹配最后一次出现的“讲座”在名称.txt

line_before_lecture, _, lecture = b_line.partition('Lecture')

我基本上需要使用脚本从这两个文件（ names.txt 和 links.txt ）中获得预期的 output （ final.txt ），如果有更好/更智能的方法，我肯定会很乐意使用它。 我只是理论上建议了一种方法，我不知道自己该怎么做

Answer 1

如果您仍然使用正则表达式，为什么不使用它们来提取这些信息呢？

import re

base = 'https://cdn.jwplayer.com/videos/'
kek = re.compile(r'(?<=\/)[\w\-\.]+(?=.m3u8)')
nre = re.compile(r'(.*)\s+Lecture (\d+)(.*)')

with open('name.txt') as b:
  lecture = []
  for line in b:
    parsed = nre.match(line)
    if parsed:
      lecture.append((int(parsed.group(2)), parsed.group(3), parsed.group(1)))
    else:
      raise ValueError('Unable to parse %r' % line)

n = len(str(lecture[-1][0]))

with open('links.txt','r') as a:
  for idx, line in enumerate(a):
    print(base + kek.search(line).group())
    fmt='  out=Lecture {0:0' + str(n) + 'n}{1}- {2}.mp4'
    print(fmt.format(*lecture[idx]))

这只会遍历name.txt中的内容一次，并将结果存储在一个变量lecture中，其中包含我们提取的片段的元组（编号、后缀、标题）。

我还将其更改为写入标准 output； 如果您愿意，可以重定向到文件，或者切换回在脚本本身中显式硬编码 output 文件。

splat 语法*lecture只是一个简写，以避免必须显式地编写lecture[0], lecture[1], lecture[2] 。

演示： https://repl.it/repls/TatteredInexperiencedFibonacci#main.py

Answer 2

问题在于 cat names.txt 的最后一行。

>>> line = "Labour Costing Lecture 352 (Classroom Lecture)"
>>> [c for c in line.rpartition(' ')[2]]
['L', 'e', 'c', 't', 'u', 'r', 'e', ')']

显然不是您要提取的内容。 由于这些都不是数字，因此它返回一个不能转换为 int 的空字符串。 如果您要提取 int，我建议您查看以下问题： How to extract numbers from a string in Python?

如何修复 ValueError：int() 的无效文字，基数为 10：''？

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-08-13 09:41:44

解决方案2
0 2020-08-12 17:31:21

如何修复 ValueError：int() 的无效文字，基数为 10：''？

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-08-13 09:41:44

解决方案2 0 2020-08-12 17:31:21

解决方案1
1 已采纳 2020-08-13 09:41:44

解决方案2
0 2020-08-12 17:31:21