[英]Read csv data into columns same as csv file using python
我正在嘗試打開此CSV文件,然后將數據解析為列。 問題是數據輸入的方式導致了我的問題。 當我嘗試運行python腳本時,我獲得了每個句子中包含[[DATA DATA']]的所有數據。 我想將數據解析為“ Account#”,“ Service Address”,“ City”等列。就像下面已經存在的列名一樣。 像我說的那樣,該數據的結構方式很奇怪,因為它的列頭上下都有。 例如,列標題“ Account#”在下面有第二個列標題作為“費率代碼”。 不確定執行此操作的最佳方法,並希望從專家那里得到一些建議。
Python腳本
import csv
with open('C:/Users/DEMO/Documents/statement-9-28-18.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)
for line in csv_reader:
print(line)
結果
[' XYZ COMPANY DATE : 09/28/18 ']
[' PAGE : 1 ']
[' ELECTRIC BILL STATEMENT ']
[' ']
[' CUSTOMER NAME: XYZ CUSTOMER SUMMARY BILL NUMBER: 12345-67890 IF YOU HAVE ANY QUESTIONS, ']
[' CUSTOMER NUMBER: 1111111 PLEASE CONTACT: ']
[' MAILING ADDRESS: 4122 RICHARDSON ST ']
[' BILLING DATE: 09/28/18 SUMB@XYZ.COM45 ']
[' SANFORD FL 32771 PAST DUE DATE: 10/09/18 (305)333-3333 ']
[' ']
[' ']
[' READ SVC B MAXIMUM TOTAL DUE METER NO REMARKS ']
[' ACCOUNT # SERVICE ADDRESS CITY DATE DAY C KWH KWD AMOUNT ']
[' RATE CODE CY CUSTOMER NAME MAILING ADDRESS ']
[' ---------------------------------------------------------------------------------------------------------------------------------- ']
[' 11111-22222 485 JOHNSON AVE APT 1405 MIAMI 09/26/18 28 C 140 29.11 BAT0123 ']
[' RS-1 XYZ COMPANY 485 JOHNSON AVE ']
[' ']
[' 22222-33333 485 JOHNSON AVE APT 3541 MIAMI 09/26/18 28 C 130 28.08 BAT0123 ']
[' RS-1 XYZ COMPANY 485 JOHNSON AVE ']
[' ']
[' 33333-44444 485 JOHNSON AVE APT 4544 MIAMI 09/26/18 28 C 172 32.42 BAT0123 ']
[' RS-1 XYZ COMPANY 485 JOHNSON AVE ']
[' ']
[' 55555-66666 485 JOHNSON ST AVE APT 1111 MIAMI 09/26/18 28 C 243 39.81 BAT0123 ']
[' RS-1 XYZ COMPANY 485 JOHNSON AVE ']
問題 :我想將數據解析為列
注意 :簡單的
regex
將在-
和/
上分開。 如果根據需要擴展regex
,可以避免這種情況。
import re
rc = re.compile(r'(\w+)')
with open('C:/Users/DEMO/Documents/statement-9-28-18.csv', 'r') as itxt:
for n, line in enumerate(itxt.readline(), 1):
# Row 13 and 14 hold the Header
if n in [13, 14]:
findall = re.findall(rc, line)
print("{}".format(findall))
if n >= 16 and n%3 > 0:
findall = re.findall(rc, line)
print("{}".format(findall))
輸出 :
['ACCOUNT', 'SERVICE', 'ADDRESS', 'CITY', 'DATE', 'DAY', 'C', 'KWH', 'KWD', 'AMOUNT'] ['RATE', 'CODE', 'CY', 'CUSTOMER', 'NAME', 'MAILING', 'ADDRESS'] ['11111', '22222', '485', 'JOHNSON', 'AVE', 'APT', '1405', 'MIAMI', '09', '26', '18', '28', 'C', '140', '29', '11', 'BAT0123'] ['RS', '1', 'XYZ', 'COMPANY', '485', 'JOHNSON', 'AVE'] ['22222', '33333', '485', 'JOHNSON', 'AVE', 'APT', '3541', 'MIAMI', '09', '26', '18', '28', 'C', '130', '28', '08', 'BAT0123'] ['RS', '1', 'XYZ', 'COMPANY', '485', 'JOHNSON', 'AVE'] ['33333', '44444', '485', 'JOHNSON', 'AVE', 'APT', '4544', 'MIAMI', '09', '26', '18', '28', 'C', '172', '32', '42', 'BAT0123'] ['RS', '1', 'XYZ', 'COMPANY', '485', 'JOHNSON', 'AVE'] ['55555', '66666', '485', 'JOHNSON', 'ST', 'AVE', 'APT', '1111', 'MIAMI', '09', '26', '18', '28', 'C', '243', '39', '81', 'BAT0123'] ['RS', '1', 'XYZ', 'COMPANY', '485', 'JOHNSON', 'AVE']
使用Python測試:3.4.2
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.