简体   繁体   中英

Removing spaces between lines in HTML response, in Python

I'm writing a script to help update a small blog hosted on my website, but for some reason, when I request the HTML for the page, so that I can write it to memory and modify it, it seems to be spacing the lines out:

Expected:

<html>
    <head>
        <!-- <link rel="icon" href="/sort/this/later.jpg" type="image/x-icon" />-->
        <title>foo</title>
        <meta name="description" content="bar" />

What my script recieves:

<html>

    <head>

        <!-- <link rel="icon" href="/sort/this/later.jpg" type="image/x-icon" />-->

        <title>foo</title>

        <meta name="description" content="bar" />

I've tried stripping \n and \r characters from the response, but this doesn't seem to make any change whatsoever.

Edit: Sorry, I forgot to post the actual script itself. Here you go:

import neocities
import requests
import re
nc = neocities.NeoCities(api_key='[no]')

response = nc.info()
print(response)

htmlresponse = requests.get('https://thesite.com/index.html')

oldBlog = open('newindex.html', 'w')
oldBlog.write(str(htmlresponse.text).strip('\n').strip('\r'))
oldBlog.close()

with open('newindex.html', 'r') as blog:
    contents = blog.readlines()

contents.insert(39,'        <p class="header">test lol</p>\n'
                   '        <p class="logpost">foobar</p>\n')

with open('newindex.html', 'w') as blog:
    contents = "".join(contents)
    blog.write(contents)

I know the method I'm using to strip the characters is incredibly janky, but I'm just doing it to see if it works. If it ends up working, I'll make it cleaner.

change

oldBlog.write(str(htmlresponse.text).strip('\n').strip('\r'))

to

oldBlog.write(str(htmlresponse.text).replace('\n', ''))

Suppose your html is in python string (in your code html_string is str(htmlresponse.text) ):

html_string = '''<html>

    <head>

        <!-- <link rel="icon" href="/sort/this/later.jpg" type="image/x-icon" />-->

        <title>foo</title>

        <meta name="description" content="bar" />
'''

Splitting it by newline html_string.split('\n') will output:

['<html>',
 '',
 '    <head>',
 '',
 '        <!-- <link rel="icon" href="/sort/this/later.jpg" type="image/x-icon" />-->',
 '',
 '        <title>foo</title>',
 '',
 '        <meta name="description" content="bar" />',
 '']

This code will extract each string inside the list and keep it if the length of the string is > 0

list1 = [line for line in html_string.split('\n') if len(line) > 0]

or more compact:

list1 = [line for line in html_string.split('\n') if line]

which will give you:

['<html>',
 '    <head>',
 '        <!-- <link rel="icon" href="/sort/this/later.jpg" type="image/x-icon" />-->',
 '        <title>foo</title>',
 '        <meta name="description" content="bar" />']

But list1 is a list. To convert it back to string you will need:

new_html_string = '\n'.join(list1)

Printing new_html_string will then give you:

<html>
    <head>
        <!-- <link rel="icon" href="/sort/this/later.jpg" type="image/x-icon" />-->
        <title>foo</title>
        <meta name="description" content="bar" />

To sum it all up:

html_string = '''<html>

    <head>

        <!-- <link rel="icon" href="/sort/this/later.jpg" type="image/x-icon" />-->

        <title>foo</title>

        <meta name="description" content="bar" />
'''
list1 = [line for line in html_string.split('\n') if line]
new_html_string = '\n'.join(list1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM