Python 读取 csv 文件并从中删除空格

Question

I've created a script that reads a csv file.我创建了一个读取 csv 文件的脚本。 It looks ok when I run it in Pycharm, however when I mark the output text and click CTRL+C and paste it into Notepad then I get spaces between each letter.当我在 Pycharm 中运行它时看起来没问题，但是当我标记 output 文本并单击 CTRL+C 并将其粘贴到记事本中时，我会在每个字母之间看到空格。

For example when I have the file in Excel then I get this:例如，当我在 Excel 中有文件时，我得到这个：

30.11.2020 09:03    Torbj%C3%B8rn+%3CTorbj%C3%B8rn%3E   SPF+%3CSeksjon+for+Passord+og+Forebygging%3E    Vennligst+endre+passordet+mitt+til+PST%7Bfacb0950fb7a5c537cf7fa68b8894027%7D

When I print copy it from Pycharm output I get this:当我从 Pycharm output 打印副本时，我得到了这个：

2 0 2 0 - 1 1 - 3 0   0 9 : 0 3 : 5 1    T o r b j % C 3 % B 8 r n   % 3 C T o r b j % C 3 % B 8 r n % 3 E       S P F   % 3 C S e k s j o n   f o r   P a s s o r d   o g   F o r e b y g g i n g % 3 E         V e n n l i g s t   e n d r e   p a s s o r d e t   m i t t   t i l   P S T % 7 B f a c b 0 9 5 0 f b 7 a 5 c 5 3 7 c f 7 f a 6 8 b 8 8 9 4 0 2 7 % 7 D

How can I remove the white spaces?如何删除空格？

Ive tried to use line = line.strip() with no luck.我试过使用line = line.strip()但没有运气。

My script:我的脚本：

class Day05:
    print('')
    print('~~~~~~~~~~~~~~~~~~~~~~~~ Day 05 ~~~~~~~~~~~~~~~~~~~~~~~~')
    print('')

    def printDataInLogFile():
        # Header
        print("Datetime\t", end='')
        print("Name\t", end='')
        print("Section\t", end='')
        print("Message")

        # Read and loop line by line
        file1 = open('./log.csv', 'r')
        lines = file1.readlines()
        for line in lines:
            line = line.strip()
            line = line.replace('+', ' ')
            line = line.replace('%C3%A6', 'æ')
            line = line.replace('%C3%B8', 'ø')
            line = line.replace('%C3%A5', 'å')
            line = line.replace('%7B', '{')
            line = line.replace('%7D', '}')
            date = ""
            name = ""
            section = ""
            message = ""

            for i, d in enumerate(line.split(";")):
                if(i == 0):
                    date = d
                elif(i == 1):
                    name = d
                elif(i == 2):
                    section = d
                elif(i == 3):
                    message = d

            # Body
            if(name != ""):
                print(str(date) + "\t", end='')
                print(str(name) + "\t\t", end='')
                print(str(section) + "\t\t", end='')
                print(str(message))


    """ Script start """
    printDataInLogFile()

Some line with content of log.csv:一些符合log.csv内容的行：

2020-10-01 07:00:04;Lisbeth+%3CLisbeth%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;I+dag+har+jeg+lyst+til+at+PST%7Bb53250c991675c7b0c712e9bdc2c1216%7D+skal+v%C3%A6re+passordet+mitt
2020-10-01 07:02:22;Unni+%3CUnni%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Vennligst+endre+passordet+mitt+til+PST%7B5cdadc1037fa416f7d79186adc55f1ff%7D
2020-10-01 07:03:11;Jan+%3CJan%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;I+dag+har+jeg+lyst+til+at+PST%7B1241512147283b40bfe8e2eac36ac2dd%7D+skal+v%C3%A6re+passordet+mitt
2020-10-01 07:04:26;Maria+%3CMaria%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Vennligst+endre+passordet+mitt+til+PST%7Bca1d9d8d4243c374cb14faa8363bc0dc%7D
2020-10-01 07:06:52;Mellomleder+%3CMellomleder%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Bytt+til+PST%7B99e12ae9d06336a7d9c644641388450a%7D
2020-10-01 07:09:00;Robert+%3CRobert%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;I+dag+har+jeg+lyst+til+at+PST%7Bda52537925c86ac5d5352edd78e10350%7D+skal+v%C3%A6re+passordet+mitt
2020-10-01 07:11:13;H%C3%A5kon+%3CH%C3%A5kon%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Vennligst+endre+passordet+mitt+til+PST%7B2a6fa4d619a88882dbcf1df5dff8ff65%7D
2020-10-01 07:11:56;Terje+%3CTerje%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Jeg+%C3%B8nsker+%C3%A5+endre+passord+til+PST%7B4970a0cdd3f0eb19e9ec1d7423f26de8%7D
2020-10-01 07:14:33;Anette+%3CAnette%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;I+dag+har+jeg+lyst+til+at+PST%7B1b956ee14848acccdc150db512b2084d%7D+skal+v%C3%A6re+passordet+mitt
2020-10-01 07:14:51;Daniel+%3CDaniel%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Bytt+til+PST%7B80f7c07f7d06bbcd38f3af5c90afe866%7D
2020-10-01 07:15:29;Systemeier+%3CSystemeier%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Bytt+til+PST%7Be905beda4ccdfaf8c7b3388d057e37c4%7D

Answer 1

I have the file in Excel then I get this:我在 Excel 中有文件，然后我得到这个：
 30.11.2020 09:03
When I print copy it from Pycharm output I get this:当我从 Pycharm output 打印副本时，我得到了这个：
 2 0 2 0 - 1 1 - 3 0 0 9

You've saved the file as Unicode in Excel, but you are not reading the file as Unicode in Python.您已在 Excel 中将文件另存为 Unicode，但在 ZA721172B5629 中未将文件读取为 Unicode。

# Read and loop line by line
with open('./log.csv', 'r', encoding='utf-16-le') as file1:
    for line in file1:
        print(line)

Notes笔记

Use context managers to open files ( with open(...) as f: ) instead of naked open() calls.使用上下文管理器打开文件（ with open(...) as f: ）而不是裸open()调用。
Always open text files with an explicitly specified encoding .始终使用明确指定的encoding打开文本文件。 If you don't know the encoding, you need to find out.如果您不知道编码，则需要找出。 Trusting in defaults does not work here.信任默认值在这里不起作用。
Use the csv module to read CSV files.使用csv模块读取 CSV 文件。
Use the urllib module to decode URL-encoded values , instead of trying to do manual string replacements.使用urllib模块来解码 URL 编码的值，而不是尝试手动替换字符串。

Eg (for a single input that represents the "value" part in a key=value pair):例如（对于表示key=value对中“值”部分的单个输入）：

from urllib.parse import parse_qs

raw_value = "Torbj%C3%B8rn+%3CTorbj%C3%B8rn%3E"
parsed_value = parse_qs(f"temp={raw_value}")            # -> {'temp': ['Torbjørn <Torbjørn>']}
actual_value = parsed_value['temp'][0]                  # -> 'Torbjørn <Torbjørn>'

can be turned into a function可以变成function

def decode_url_value(raw_value):
    parsed_value = parse_qs(f"temp={raw_value}")
    return parsed_value['temp'][0]

decode_url_value("Torbj%C3%B8rn+%3CTorbj%C3%B8rn%3E")   # -> 'Torbjørn <Torbjørn>'

Answer 2

If you use the libs unidecode and urllib , you can easily do this:如果您使用库unidecode和urllib ，您可以轻松地做到这一点：

from unidecode import unidecode
from urllib.parse import unquote

...
file1 = open('./log.csv', 'r')
lines = file1.readlines()
for line in lines:
   line = unidecode(unquote(line))
   line = line.strip()
   line = line.replace('+', ' ')
   # line = line.replace('%C3%A6', 'æ')
   # line = line.replace('%C3%B8', 'ø')
   # line = line.replace('%C3%A5', 'å')
   # line = line.replace('%7B', '{')
   # line = line.replace('%7D', '}')
...

You'd no longer need to manually replace special characters yourself.您不再需要自己手动替换特殊字符。

Answer 3

str.strip() only removes leading and ending spaces, in order to remove all space characters, use str.replace(" ", "") str.strip()只删除前导和结束空格，为了删除所有空格字符，使用str.replace(" ", "")

Python 读取 csv 文件并从中删除空格

问题描述

3 个解决方案

解决方案1
2 已采纳 2020-12-05 08:54:06

解决方案2
1 2020-12-05 09:00:36

解决方案3
-1 2020-12-05 08:54:02

Python 读取 csv 文件并从中删除空格

问题描述

3 个解决方案

解决方案1 2 已采纳 2020-12-05 08:54:06

解决方案2 1 2020-12-05 09:00:36

解决方案3 -1 2020-12-05 08:54:02

解决方案1
2 已采纳 2020-12-05 08:54:06

解决方案2
1 2020-12-05 09:00:36

解决方案3
-1 2020-12-05 08:54:02