如何修復 ValueError：int() 的無效文字，基數為 10：''？

Question

我正在使用帶有正則表達式模塊的 python 腳本嘗試處理 2 個文件並根據需要創建最終的 output 但出現一些錯誤。

貓鏈接.txt

https://videos-a.jwpsrv.com/content/conversions/7kHOkkQa/videos/XXXXJD8C-32313922.mp4.m3u8?hdnts=exp=1596554537~acl=*/bGxpJD8C-32313922.mp4.m3u8~hmac=2ac95222f1693d11e7fd8758eb0a18d6d2ee187bb10e3c27311e627785687bd5
https://videos-a.jwpsrv.com/content/conversions/7kHOkkQa/videos/XXXXkxI1-32313922.mp4.m3u8?hdnts=exp=1596554733~acl=*/bM07kxI1-32313922.mp4.m3u8~hmac=dd0fc6f433a8ac74c9eaa2a376fa4324a65ae7c410cdcf8e869c6961f1a5b5ea
https://videos-a.jwpsrv.com/content/conversions/7kHOkkQa/videos/XXXXpGKZ-32313922.mp4.m3u8?hdnts=exp=1596554748~acl=*/onhIpGKZ-32313922.mp4.m3u8~hmac=d4030cf7813cef02a58ca17127a0bc6b19dc93cccd6add4edc72a2ee5154f236
https://videos-a.jwpsrv.com/content/conversions/7kHOkkQa/videos/XXXXLbgy-32313922.mp4.m3u8?hdnts=exp=1596554871~acl=*/xGXCLbgy-32313922.mp4.m3u8~hmac=7c515306c033c88d32072d54ba1d6aa4abf1be23070d1bb14d1311e4e74cc1d7

貓名.txt

Introduction Lecture 1
Questions Lecture 1B
Theory Lecture 2
Labour Costing Lecture 352 (Classroom Lecture)

預期 (final.txt)

https://cdn.jwplayer.com/vidoes/XXXXJD8C-32313922.mp4
  out=Lecture 001- Introduction.mp4
https://cdn.jwplayer.com/vidoes/XXXXkxI1-32313922.mp4
  out=Lecture 001B- Questions.mp4
https://cdn.jwplayer.com/vidoes/XXXXpGKZ-32313922.mp4
  out=Lecture 002- Theory.mp4
https://cdn.jwplayer.com/vidoes/XXXXLbgy-32313922.mp4
  out=Lecture 352- Labour Costing (Classroom Lecture).mp4

cat sort.py（我現有的腳本）

import re

final = open('final.txt','w')
a = open('links.txt','r')
b = open('name.txt','r')
base = 'https://cdn.jwplayer.com/videos/'
kek = re.compile(r'(?<=\/)[\w\-\.]+(?=.m3u8)')
# find max lecture number
n = None
for line in b:
    b_n = int(''.join([c for c in line.rpartition(' ')[2] if c in '1234567890']))
    if n is None or b_n > n:
        n = b_n
n = len(str(n))  # string len of the max lecture number
    
b = open('name.txt','r')
for line in a:
    final.write(base + kek.search(line).group() + '\n')
    b_line = b.readline().rstrip()
    line_before_lecture, _, lecture = b_line.partition('Lecture')
    line_before_lecture = line_before_lecture.strip()
    lecture_no = lecture.rpartition(' ')[2]
    lecture_str = lecture_no.rjust(n, '0') + '-' + " " + line_before_lecture
    final.write('  out=' + 'Lecture ' + lecture_str + '.mp4\n')

追溯

Traceback (most recent call last):
  File "sort.py", line 11, in <module>
    b_n = int(''.join([c for c in line.rpartition(' ')[2] if c in '1234567890']))
ValueError: invalid literal for int() with base 10: ''

編輯- 錯誤似乎是由於 name.txt 中的最后一行，因為我的腳本假定 name.txt 中的所有行都將以 Lecture X 的格式結束。

我想解決它的一種方法是編輯腳本並添加一個if條件，如下所示：

如果 name.txt 中的任何行不以格式 - Lecture X 結尾，則將 Lecture X 之后的文本移動到單詞 Lecture 之前。

示例 name.txt 的第 4 行Labour Costing Lecture 352 (Classroom Lecture)可以轉換為Labour Costing (Classroom Lecture) Lecture 352並在我的腳本中編輯以下行以僅匹配最后一次出現的“講座”在名稱.txt

line_before_lecture, _, lecture = b_line.partition('Lecture')

我基本上需要使用腳本從這兩個文件（ names.txt 和 links.txt ）中獲得預期的 output （ final.txt ），如果有更好/更智能的方法，我肯定會很樂意使用它。 我只是理論上建議了一種方法，我不知道自己該怎么做

Answer 1

如果您仍然使用正則表達式，為什么不使用它們來提取這些信息呢？

import re

base = 'https://cdn.jwplayer.com/videos/'
kek = re.compile(r'(?<=\/)[\w\-\.]+(?=.m3u8)')
nre = re.compile(r'(.*)\s+Lecture (\d+)(.*)')

with open('name.txt') as b:
  lecture = []
  for line in b:
    parsed = nre.match(line)
    if parsed:
      lecture.append((int(parsed.group(2)), parsed.group(3), parsed.group(1)))
    else:
      raise ValueError('Unable to parse %r' % line)

n = len(str(lecture[-1][0]))

with open('links.txt','r') as a:
  for idx, line in enumerate(a):
    print(base + kek.search(line).group())
    fmt='  out=Lecture {0:0' + str(n) + 'n}{1}- {2}.mp4'
    print(fmt.format(*lecture[idx]))

這只會遍歷name.txt中的內容一次，並將結果存儲在一個變量lecture中，其中包含我們提取的片段的元組（編號、后綴、標題）。

我還將其更改為寫入標准 output； 如果您願意，可以重定向到文件，或者切換回在腳本本身中顯式硬編碼 output 文件。

splat 語法*lecture只是一個簡寫，以避免必須顯式地編寫lecture[0], lecture[1], lecture[2] 。

演示： https://repl.it/repls/TatteredInexperiencedFibonacci#main.py

Answer 2

問題在於 cat names.txt 的最后一行。

>>> line = "Labour Costing Lecture 352 (Classroom Lecture)"
>>> [c for c in line.rpartition(' ')[2]]
['L', 'e', 'c', 't', 'u', 'r', 'e', ')']

顯然不是您要提取的內容。 由於這些都不是數字，因此它返回一個不能轉換為 int 的空字符串。 如果您要提取 int，我建議您查看以下問題： How to extract numbers from a string in Python?

如何修復 ValueError：int() 的無效文字，基數為 10：''？

問題描述

2 個解決方案

解決方案1
1 已采納 2020-08-13 09:41:44

解決方案2
0 2020-08-12 17:31:21

如何修復 ValueError：int() 的無效文字，基數為 10：''？

問題描述

2 個解決方案

解決方案1 1 已采納 2020-08-13 09:41:44

解決方案2 0 2020-08-12 17:31:21

解決方案1
1 已采納 2020-08-13 09:41:44

解決方案2
0 2020-08-12 17:31:21