Removing spaces between lines in HTML response, in Python

Question

I'm writing a script to help update a small blog hosted on my website, but for some reason, when I request the HTML for the page, so that I can write it to memory and modify it, it seems to be spacing the lines out:

Expected:

<html>
    <head>
        <!-- <link rel="icon" href="/sort/this/later.jpg" type="image/x-icon" />-->
        <title>foo</title>
        <meta name="description" content="bar" />

What my script recieves:

<html>

    <head>

        <!-- <link rel="icon" href="/sort/this/later.jpg" type="image/x-icon" />-->

        <title>foo</title>

        <meta name="description" content="bar" />

I've tried stripping \n and \r characters from the response, but this doesn't seem to make any change whatsoever.

Edit: Sorry, I forgot to post the actual script itself. Here you go:

import neocities
import requests
import re
nc = neocities.NeoCities(api_key='[no]')

response = nc.info()
print(response)

htmlresponse = requests.get('https://thesite.com/index.html')

oldBlog = open('newindex.html', 'w')
oldBlog.write(str(htmlresponse.text).strip('\n').strip('\r'))
oldBlog.close()

with open('newindex.html', 'r') as blog:
    contents = blog.readlines()

contents.insert(39,'        <p class="header">test lol</p>\n'
                   '        <p class="logpost">foobar</p>\n')

with open('newindex.html', 'w') as blog:
    contents = "".join(contents)
    blog.write(contents)

I know the method I'm using to strip the characters is incredibly janky, but I'm just doing it to see if it works. If it ends up working, I'll make it cleaner.

Answer 1

change

oldBlog.write(str(htmlresponse.text).strip('\n').strip('\r'))

to

oldBlog.write(str(htmlresponse.text).replace('\n', ''))

Answer 2

Suppose your html is in python string (in your code html_string is str(htmlresponse.text) ):

html_string = '''<html>

    <head>

        <!-- <link rel="icon" href="/sort/this/later.jpg" type="image/x-icon" />-->

        <title>foo</title>

        <meta name="description" content="bar" />
'''

Splitting it by newline html_string.split('\n') will output:

['<html>',
 '',
 '    <head>',
 '',
 '        <!-- <link rel="icon" href="/sort/this/later.jpg" type="image/x-icon" />-->',
 '',
 '        <title>foo</title>',
 '',
 '        <meta name="description" content="bar" />',
 '']

This code will extract each string inside the list and keep it if the length of the string is > 0

list1 = [line for line in html_string.split('\n') if len(line) > 0]

or more compact:

list1 = [line for line in html_string.split('\n') if line]

which will give you:

['<html>',
 '    <head>',
 '        <!-- <link rel="icon" href="/sort/this/later.jpg" type="image/x-icon" />-->',
 '        <title>foo</title>',
 '        <meta name="description" content="bar" />']

But list1 is a list. To convert it back to string you will need:

new_html_string = '\n'.join(list1)

Printing new_html_string will then give you:

<html>
    <head>
        <!-- <link rel="icon" href="/sort/this/later.jpg" type="image/x-icon" />-->
        <title>foo</title>
        <meta name="description" content="bar" />

To sum it all up:

html_string = '''<html>

    <head>

        <!-- <link rel="icon" href="/sort/this/later.jpg" type="image/x-icon" />-->

        <title>foo</title>

        <meta name="description" content="bar" />
'''
list1 = [line for line in html_string.split('\n') if line]
new_html_string = '\n'.join(list1)

Removing spaces between lines in HTML response, in Python

Question

2 answers

solution1
1 ACCPTED 2021-05-13 05:14:22

solution2
0 2021-05-13 05:21:37

Removing spaces between lines in HTML response, in Python

Question

2 answers

solution1 1 ACCPTED 2021-05-13 05:14:22

solution2 0 2021-05-13 05:21:37

solution1
1 ACCPTED 2021-05-13 05:14:22

solution2
0 2021-05-13 05:21:37