[英]Python: How to import a csv like dat file with a control character delimiter
我有一個數據文件,該文件具有DC4控制字符作為分隔符。 這是我現在擁有的代碼(是從別人那里復制的,不是我的代碼)。
import csv
with open('Test.dat') as csv_file:
csv_reader = csv.reader(csv_file, quotechar='þ', delimiter='')
line_count = 0
for row in csv_reader:
if line_count == 0:
print(f'Column names are {", ".join(row)}')
line_count += 1
else:
print(f'\t{row[0]} works in the {row[1]} department, and was born in {row[2]}.')
line_count += 1
print(f'Processed {line_count} lines.')
如您所見,該字符由一個框顯示,到目前為止,只有notepad ++可以讀取它。 我發現了curses.ascii.isctrl(c),它似乎能夠通過python讀取該字符,然后將其作為插入符號讀取? ( https://docs.python.org/3.2/library/curses.ascii.html )
我是編碼新手,不確定如何實現此功能,或者不確定它是否對我有用。 以下是我嘗試以文本和屏幕截圖讀取的dat文件的示例。
þIdentifierþþColumn 2þþColumn 3þ
þXX_0012345þþRandom Data 1þþRandom Data 1þ
þXX_0012346þþRandom Data 6þþRandom Data 2þ
þXX_0012347þþRandom Data 1þþRandom Data 3þ
þXX_0012348þþRandom Data 8þþRandom Data 4þ
þXX_0012349þþRandom Data 1þþRandom Data 5þ
þXX_0012345þþRandom Data 9þþRandom Data 1þ
這是在python 3.6.1上使用此代碼時的輸出。 除了¾字符(這就是DC4字符的讀取方式)以外,其他所有內容看起來都不錯。
Column names are þIdentifierþ, þColumn 2þ, þColumn 3þ
þXX_0012345þ works in the þRandom Data 1þ department, and was born in þRandom Data 1þ.
þXX_0012346þ works in the þRandom Data 6þ department, and was born in þRandom Data 2þ.
þXX_0012347þ works in the þRandom Data 1þ department, and was born in þRandom Data 3þ.
þXX_0012348þ works in the þRandom Data 8þ department, and was born in þRandom Data 4þ.
þXX_0012349þ works in the þRandom Data 1þ department, and was born in þRandom Data 5þ.
þXX_0012345þ works in the þRandom Data 9þ department, and was born in þRandom Data 1þ.
Processed 7 lines.
任何有關這方面的幫助將不勝感激。 謝謝!
您可以為此使用轉義字符。 DC4是Ascii 20(0x14)
csv_reader = csv.reader(csv_file, quotechar='þ', delimiter='\x14')
原來這是我的計算機而不是python的問題。 顯然我看不到該字符,它僅顯示為白框。 有沒有一種方法可以編輯Windows 10以顯示該字符
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.