打印到文本仅包含一行，但 output 包含三行

Question

I have a very basic question: one question about my code below:我有一个非常基本的问题：关于下面我的代码的一个问题：

    #Python code to scrape the shipment URLs 
from bs4 import BeautifulSoup
import urllib.request
import urllib.error
import urllib

# read urls of websites from text file > change it to where you stock the file
list_open = open(r"C:\Users\**\data.csv")
#skips the header
read_list  = list_open.readlines()[1:]

import os

file_path = os.path.join('c:\\**', 'ShipmentUpdates.txt')


for url in read_list:
    soup = BeautifulSoup(urllib.request.urlopen(url).read(), "html5lib")
    # parse shipment info
    shipment = soup.find_all("span")
    Preparation = shipment[0]
    Sent = shipment[1]
    InTransit = shipment[2]
    Delivered = shipment[3]
    url = url.strip()

    line= f"{url} ; Preparation {Preparation.getText()}; Sent {Sent.getText()}; InTransit {InTransit.getText()}; Delivered {Delivered.getText()}"
    print (line)

file='c:\\**\ShipmentUpdates.txt'
with open(file, 'w') as filetowrite:
    filetowrite.write(line+'\n')

In my output, I have three lines:在我的 output 中，我有三行：

http://carmoov.fr/CfQd ; Preparation on 06/01/2022 at 17:45; Sent on 06/01/2022 at 18:14; InTransit ; Delivered on 07/01/2022 at 10:31
http://carmoov.fr/CfQs ; Preparation on 06/01/2022 at 15:01; Sent on 06/01/2022 at 18:14; InTransit ; Delivered on 07/01/2022 at 11:27
http://carmoov.fr/CfQz ; Preparation on 06/01/2022 at 11:18; Sent on 06/01/2022 at 18:14; InTransit ; Delivered on 07/01/2022 at 11:56

But in my text file, it is only one line:但在我的文本文件中，只有一行：

http://carmoov.fr/CfQz ; Preparation on 06/01/2022 at 11:18; Sent on 06/01/2022 at 18:14; InTransit ; Delivered on 07/01/2022 at 11:56

I need the exactly same result of 3 lines in the text.我需要文本中 3 行完全相同的结果。 Anything wrong here?这里有什么问题吗？ Thank you in advance!先感谢您！

Answer 1

This most likely happens because the URLs have a newline character \n at their end, because you read them as individual lines from list_open .这很可能发生，因为 URL 在其末尾有一个换行符\n ，因为您将它们作为list_open中的单独行读取。 To prevent this, replace {url} with {url.strip()} in your format-string.为防止这种情况，请将格式字符串中的{url}替换为{url.strip()} 。

Answer 2

Just use url = url.strip() .只需使用url = url.strip() 。 This will removed the newline character '\n':这将删除换行符 '\n'：

# read urls of websites from text file > change it to where you stock the file
list_open = open(r"C:\***\data.csv")
#skips the header
read_list  = list_open.readlines()[1:]
file_path = "ShipmentUpdates.txt"

for url in read_list:
    soup = BeautifulSoup(urllib.request.urlopen(url).read(), "html5lib")
    # parse shipment info
    shipment = soup.find_all("span")
    Preparation = shipment[0]
    Sent = shipment[1]
    InTransit = shipment[2]
    Delivered = shipment[3]
    url = url.strip()
    with open(file_path, "a") as f:
        line= f"{url} ; Preparation {Preparation.getText()}; Sent {Sent.getText()}; InTransit {InTransit.getText()}; Delivered {Delivered.getText()}"
        print (line)
        f.write(line+'\n')

Answer 3

An alternative answer would be to use list comprehension to strip each url right as you get them:另一种答案是使用列表理解来剥离每个 url ，因为你得到它们：

list_open = open(r"C:\***\data.csv")
#skips the header
read_list = (url.strip() for url in list_open.readlines()[1:])

It's important to note that doing this means you can only access read_list once (which you do in you code sample), because this list comprehension creates what's called a generator (mentioned in the previous link).请务必注意，这样做意味着您只能访问一次read_list （您在代码示例中执行此操作），因为此列表理解创建了所谓的生成器（在上一个链接中提到）。

If you wanted to access read_list again after your initial for-loop, you could convert it to a tuple or list by doing read_list = tuple(url.strip() for url in list_open.readlines()[1:]) or read_list = list(url.strip() for url in list_open.readlines()[1:]) .如果您想在初始 for 循环之后再次访问 read_list，您可以通过执行read_list = tuple(url.strip() for url in list_open.readlines()[1:])或read_list = list(url.strip() for url in list_open.readlines()[1:])将其转换为元组或列表read_list = list(url.strip() for url in list_open.readlines()[1:]) 。

打印到文本仅包含一行，但 output 包含三行

问题描述

3 个解决方案

解决方案1
2 2022-01-20 18:04:15

解决方案2
1 2022-01-20 18:04:55

解决方案3
1 2022-01-20 18:12:23

打印到文本仅包含一行，但 output 包含三行

问题描述

3 个解决方案

解决方案1 2 2022-01-20 18:04:15

解决方案2 1 2022-01-20 18:04:55

解决方案3 1 2022-01-20 18:12:23

解决方案1
2 2022-01-20 18:04:15

解决方案2
1 2022-01-20 18:04:55

解决方案3
1 2022-01-20 18:12:23