[英]Python read csv file and strip spaces from it
我創建了一個讀取 csv 文件的腳本。 當我在 Pycharm 中運行它時看起來沒問題,但是當我標記 output 文本並單擊 CTRL+C 並將其粘貼到記事本中時,我會在每個字母之間看到空格。
例如,當我在 Excel 中有文件時,我得到這個:
30.11.2020 09:03 Torbj%C3%B8rn+%3CTorbj%C3%B8rn%3E SPF+%3CSeksjon+for+Passord+og+Forebygging%3E Vennligst+endre+passordet+mitt+til+PST%7Bfacb0950fb7a5c537cf7fa68b8894027%7D
當我從 Pycharm output 打印副本時,我得到了這個:
2 0 2 0 - 1 1 - 3 0 0 9 : 0 3 : 5 1 T o r b j % C 3 % B 8 r n % 3 C T o r b j % C 3 % B 8 r n % 3 E S P F % 3 C S e k s j o n f o r P a s s o r d o g F o r e b y g g i n g % 3 E V e n n l i g s t e n d r e p a s s o r d e t m i t t t i l P S T % 7 B f a c b 0 9 5 0 f b 7 a 5 c 5 3 7 c f 7 f a 6 8 b 8 8 9 4 0 2 7 % 7 D
如何刪除空格?
我試過使用line = line.strip()
但沒有運氣。
我的腳本:
class Day05:
print('')
print('~~~~~~~~~~~~~~~~~~~~~~~~ Day 05 ~~~~~~~~~~~~~~~~~~~~~~~~')
print('')
def printDataInLogFile():
# Header
print("Datetime\t", end='')
print("Name\t", end='')
print("Section\t", end='')
print("Message")
# Read and loop line by line
file1 = open('./log.csv', 'r')
lines = file1.readlines()
for line in lines:
line = line.strip()
line = line.replace('+', ' ')
line = line.replace('%C3%A6', 'æ')
line = line.replace('%C3%B8', 'ø')
line = line.replace('%C3%A5', 'å')
line = line.replace('%7B', '{')
line = line.replace('%7D', '}')
date = ""
name = ""
section = ""
message = ""
for i, d in enumerate(line.split(";")):
if(i == 0):
date = d
elif(i == 1):
name = d
elif(i == 2):
section = d
elif(i == 3):
message = d
# Body
if(name != ""):
print(str(date) + "\t", end='')
print(str(name) + "\t\t", end='')
print(str(section) + "\t\t", end='')
print(str(message))
""" Script start """
printDataInLogFile()
一些符合log.csv內容的行:
2020-10-01 07:00:04;Lisbeth+%3CLisbeth%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;I+dag+har+jeg+lyst+til+at+PST%7Bb53250c991675c7b0c712e9bdc2c1216%7D+skal+v%C3%A6re+passordet+mitt
2020-10-01 07:02:22;Unni+%3CUnni%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Vennligst+endre+passordet+mitt+til+PST%7B5cdadc1037fa416f7d79186adc55f1ff%7D
2020-10-01 07:03:11;Jan+%3CJan%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;I+dag+har+jeg+lyst+til+at+PST%7B1241512147283b40bfe8e2eac36ac2dd%7D+skal+v%C3%A6re+passordet+mitt
2020-10-01 07:04:26;Maria+%3CMaria%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Vennligst+endre+passordet+mitt+til+PST%7Bca1d9d8d4243c374cb14faa8363bc0dc%7D
2020-10-01 07:06:52;Mellomleder+%3CMellomleder%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Bytt+til+PST%7B99e12ae9d06336a7d9c644641388450a%7D
2020-10-01 07:09:00;Robert+%3CRobert%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;I+dag+har+jeg+lyst+til+at+PST%7Bda52537925c86ac5d5352edd78e10350%7D+skal+v%C3%A6re+passordet+mitt
2020-10-01 07:11:13;H%C3%A5kon+%3CH%C3%A5kon%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Vennligst+endre+passordet+mitt+til+PST%7B2a6fa4d619a88882dbcf1df5dff8ff65%7D
2020-10-01 07:11:56;Terje+%3CTerje%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Jeg+%C3%B8nsker+%C3%A5+endre+passord+til+PST%7B4970a0cdd3f0eb19e9ec1d7423f26de8%7D
2020-10-01 07:14:33;Anette+%3CAnette%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;I+dag+har+jeg+lyst+til+at+PST%7B1b956ee14848acccdc150db512b2084d%7D+skal+v%C3%A6re+passordet+mitt
2020-10-01 07:14:51;Daniel+%3CDaniel%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Bytt+til+PST%7B80f7c07f7d06bbcd38f3af5c90afe866%7D
2020-10-01 07:15:29;Systemeier+%3CSystemeier%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Bytt+til+PST%7Be905beda4ccdfaf8c7b3388d057e37c4%7D
我在 Excel 中有文件,然后我得到這個:
30.11.2020 09:03
當我從 Pycharm output 打印副本時,我得到了這個:
2 0 2 0 - 1 1 - 3 0 0 9
您已在 Excel 中將文件另存為 Unicode,但在 ZA721172B5629 中未將文件讀取為 Unicode。
# Read and loop line by line
with open('./log.csv', 'r', encoding='utf-16-le') as file1:
for line in file1:
print(line)
筆記
with open(...) as f:
)而不是裸open()
調用。encoding
打開文本文件。 如果您不知道編碼,則需要找出。 信任默認值在這里不起作用。csv
模塊讀取 CSV 文件。urllib
模塊來解碼 URL 編碼的值,而不是嘗試手動替換字符串。 例如(對於表示key=value
對中“值”部分的單個輸入):
from urllib.parse import parse_qs
raw_value = "Torbj%C3%B8rn+%3CTorbj%C3%B8rn%3E"
parsed_value = parse_qs(f"temp={raw_value}") # -> {'temp': ['Torbjørn <Torbjørn>']}
actual_value = parsed_value['temp'][0] # -> 'Torbjørn <Torbjørn>'
可以變成function
def decode_url_value(raw_value):
parsed_value = parse_qs(f"temp={raw_value}")
return parsed_value['temp'][0]
decode_url_value("Torbj%C3%B8rn+%3CTorbj%C3%B8rn%3E") # -> 'Torbjørn <Torbjørn>'
如果您使用庫unidecode
和urllib
,您可以輕松地做到這一點:
from unidecode import unidecode
from urllib.parse import unquote
...
file1 = open('./log.csv', 'r')
lines = file1.readlines()
for line in lines:
line = unidecode(unquote(line))
line = line.strip()
line = line.replace('+', ' ')
# line = line.replace('%C3%A6', 'æ')
# line = line.replace('%C3%B8', 'ø')
# line = line.replace('%C3%A5', 'å')
# line = line.replace('%7B', '{')
# line = line.replace('%7D', '}')
...
您不再需要自己手動替換特殊字符。
str.strip()
只刪除前導和結束空格,為了刪除所有空格字符,使用str.replace(" ", "")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.