[英]Python read csv file and strip spaces from it
I've created a script that reads a csv file.我创建了一个读取 csv 文件的脚本。 It looks ok when I run it in Pycharm, however when I mark the output text and click CTRL+C and paste it into Notepad then I get spaces between each letter.
当我在 Pycharm 中运行它时看起来没问题,但是当我标记 output 文本并单击 CTRL+C 并将其粘贴到记事本中时,我会在每个字母之间看到空格。
For example when I have the file in Excel then I get this:例如,当我在 Excel 中有文件时,我得到这个:
30.11.2020 09:03 Torbj%C3%B8rn+%3CTorbj%C3%B8rn%3E SPF+%3CSeksjon+for+Passord+og+Forebygging%3E Vennligst+endre+passordet+mitt+til+PST%7Bfacb0950fb7a5c537cf7fa68b8894027%7D
When I print copy it from Pycharm output I get this:当我从 Pycharm output 打印副本时,我得到了这个:
2 0 2 0 - 1 1 - 3 0 0 9 : 0 3 : 5 1 T o r b j % C 3 % B 8 r n % 3 C T o r b j % C 3 % B 8 r n % 3 E S P F % 3 C S e k s j o n f o r P a s s o r d o g F o r e b y g g i n g % 3 E V e n n l i g s t e n d r e p a s s o r d e t m i t t t i l P S T % 7 B f a c b 0 9 5 0 f b 7 a 5 c 5 3 7 c f 7 f a 6 8 b 8 8 9 4 0 2 7 % 7 D
How can I remove the white spaces?如何删除空格?
Ive tried to use line = line.strip()
with no luck.我试过使用
line = line.strip()
但没有运气。
My script:我的脚本:
class Day05:
print('')
print('~~~~~~~~~~~~~~~~~~~~~~~~ Day 05 ~~~~~~~~~~~~~~~~~~~~~~~~')
print('')
def printDataInLogFile():
# Header
print("Datetime\t", end='')
print("Name\t", end='')
print("Section\t", end='')
print("Message")
# Read and loop line by line
file1 = open('./log.csv', 'r')
lines = file1.readlines()
for line in lines:
line = line.strip()
line = line.replace('+', ' ')
line = line.replace('%C3%A6', 'æ')
line = line.replace('%C3%B8', 'ø')
line = line.replace('%C3%A5', 'å')
line = line.replace('%7B', '{')
line = line.replace('%7D', '}')
date = ""
name = ""
section = ""
message = ""
for i, d in enumerate(line.split(";")):
if(i == 0):
date = d
elif(i == 1):
name = d
elif(i == 2):
section = d
elif(i == 3):
message = d
# Body
if(name != ""):
print(str(date) + "\t", end='')
print(str(name) + "\t\t", end='')
print(str(section) + "\t\t", end='')
print(str(message))
""" Script start """
printDataInLogFile()
Some line with content of log.csv:一些符合log.csv内容的行:
2020-10-01 07:00:04;Lisbeth+%3CLisbeth%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;I+dag+har+jeg+lyst+til+at+PST%7Bb53250c991675c7b0c712e9bdc2c1216%7D+skal+v%C3%A6re+passordet+mitt
2020-10-01 07:02:22;Unni+%3CUnni%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Vennligst+endre+passordet+mitt+til+PST%7B5cdadc1037fa416f7d79186adc55f1ff%7D
2020-10-01 07:03:11;Jan+%3CJan%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;I+dag+har+jeg+lyst+til+at+PST%7B1241512147283b40bfe8e2eac36ac2dd%7D+skal+v%C3%A6re+passordet+mitt
2020-10-01 07:04:26;Maria+%3CMaria%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Vennligst+endre+passordet+mitt+til+PST%7Bca1d9d8d4243c374cb14faa8363bc0dc%7D
2020-10-01 07:06:52;Mellomleder+%3CMellomleder%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Bytt+til+PST%7B99e12ae9d06336a7d9c644641388450a%7D
2020-10-01 07:09:00;Robert+%3CRobert%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;I+dag+har+jeg+lyst+til+at+PST%7Bda52537925c86ac5d5352edd78e10350%7D+skal+v%C3%A6re+passordet+mitt
2020-10-01 07:11:13;H%C3%A5kon+%3CH%C3%A5kon%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Vennligst+endre+passordet+mitt+til+PST%7B2a6fa4d619a88882dbcf1df5dff8ff65%7D
2020-10-01 07:11:56;Terje+%3CTerje%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Jeg+%C3%B8nsker+%C3%A5+endre+passord+til+PST%7B4970a0cdd3f0eb19e9ec1d7423f26de8%7D
2020-10-01 07:14:33;Anette+%3CAnette%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;I+dag+har+jeg+lyst+til+at+PST%7B1b956ee14848acccdc150db512b2084d%7D+skal+v%C3%A6re+passordet+mitt
2020-10-01 07:14:51;Daniel+%3CDaniel%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Bytt+til+PST%7B80f7c07f7d06bbcd38f3af5c90afe866%7D
2020-10-01 07:15:29;Systemeier+%3CSystemeier%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Bytt+til+PST%7Be905beda4ccdfaf8c7b3388d057e37c4%7D
I have the file in Excel then I get this:
我在 Excel 中有文件,然后我得到这个:
30.11.2020 09:03
When I print copy it from Pycharm output I get this:
当我从 Pycharm output 打印副本时,我得到了这个:
2 0 2 0 - 1 1 - 3 0 0 9
You've saved the file as Unicode in Excel, but you are not reading the file as Unicode in Python.您已在 Excel 中将文件另存为 Unicode,但在 ZA721172B5629 中未将文件读取为 Unicode。
# Read and loop line by line
with open('./log.csv', 'r', encoding='utf-16-le') as file1:
for line in file1:
print(line)
Notes笔记
with open(...) as f:
) instead of naked open()
calls.with open(...) as f:
)而不是裸open()
调用。encoding
.encoding
打开文本文件。 If you don't know the encoding, you need to find out.csv
module to read CSV files.csv
模块读取 CSV 文件。urllib
module to decode URL-encoded values , instead of trying to do manual string replacements.urllib
模块来解码 URL 编码的值,而不是尝试手动替换字符串。 Eg (for a single input that represents the "value" part in a key=value
pair):例如(对于表示
key=value
对中“值”部分的单个输入):
from urllib.parse import parse_qs
raw_value = "Torbj%C3%B8rn+%3CTorbj%C3%B8rn%3E"
parsed_value = parse_qs(f"temp={raw_value}") # -> {'temp': ['Torbjørn <Torbjørn>']}
actual_value = parsed_value['temp'][0] # -> 'Torbjørn <Torbjørn>'
can be turned into a function可以变成function
def decode_url_value(raw_value):
parsed_value = parse_qs(f"temp={raw_value}")
return parsed_value['temp'][0]
decode_url_value("Torbj%C3%B8rn+%3CTorbj%C3%B8rn%3E") # -> 'Torbjørn <Torbjørn>'
If you use the libs unidecode
and urllib
, you can easily do this:如果您使用库
unidecode
和urllib
,您可以轻松地做到这一点:
from unidecode import unidecode
from urllib.parse import unquote
...
file1 = open('./log.csv', 'r')
lines = file1.readlines()
for line in lines:
line = unidecode(unquote(line))
line = line.strip()
line = line.replace('+', ' ')
# line = line.replace('%C3%A6', 'æ')
# line = line.replace('%C3%B8', 'ø')
# line = line.replace('%C3%A5', 'å')
# line = line.replace('%7B', '{')
# line = line.replace('%7D', '}')
...
You'd no longer need to manually replace special characters yourself.您不再需要自己手动替换特殊字符。
str.strip()
only removes leading and ending spaces, in order to remove all space characters, use str.replace(" ", "")
str.strip()
只删除前导和结束空格,为了删除所有空格字符,使用str.replace(" ", "")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.