[英]How can I read csv file only after finding a certain pattern with Python?
所以我有幾個代表一些數據的csv文件,每個文件可能有不同的初始注釋行
table_doi: 10.17182/hepdata.52402.v1/t7
name: Table 7
...
ABS(YRAP), < 0.1
SQRT(S) [GeV], 1960
PT [GEV], PT [GEV] LOW, PT [GEV] HIGH, D2(SIG)/DYRAP/DPT [NB/GEV]
67, 62, 72, 6.68
...
613.5, 527, 700, 1.81E-07
我只想讀入相關數據及其標題,從行開始
PT [GEV], PT [GEV] LOW, PT [GEV] HIGH, D2(SIG)/DYRAP/DPT [NB/GEV]
因此,我想到的策略是找到模式PT [GEV]
並從那里開始閱讀。
但是,我不確定如何在 Python 中實現這一點,有人可以幫助我嗎?
先感謝您!
順便說一句,我目前擁有的功能是
import os
import glob
import csv
def read_multicolumn_csv_files_into_dictionary(folderpath, dictionary):
filepath = folderpath + '*.csv'
files = sorted(glob.glob(filepath))
for file in files:
data_set = file.replace(folderpath, '').replace('.csv', '')
dictionary[data_set] = {}
with open(file, 'r') as data_file:
data_pipe = csv.DictReader(data_file)
dictionary[data_set]['pt'] = []
dictionary[data_set]['sigma'] = []
for row in data_pipe:
dictionary[data_set]['pt'].append(float(row['PT [GEV]']))
dictionary[data_set]['sigma'].append(float(row['D2(SIG)/DYRAP/DPT [NB/GEV]']))
return dictionary
僅當我手動刪除 csv 文件中的那些初始注釋時才有效。
結帳startswith
. 此外,您可以在此處找到詳細說明。 https://cmdlinetips.com/2018/01/3-ways-to-read-a-file-and-skip-initial-comments-in-python/
假設每個文件都有一行以PT [GEV]
開頭:
import os
import pandas as pd
...
csvs = []
for file in files:
with open(file) as f:
for i, l in enumerate(f):
if l.startswith('PT [GEV]'):
csvs.append(pd.read_csv(file, skiprows = i))
break
df = pd.concat(csvs)
試試這個,它將搜索包含PT [GEV]
的行,如果找到包含,它會將m
更改為 true 並開始將其余日期附加到列表中:
import csv
contain= 'PT [GEV]'
List=[]
m=false
with open('Users.csv', 'rt') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
for field in row:
if field == contain:
m=true
if m==true:
List.append(row)
您可以使用file.tell
方法在讀取時保存文件指針位置並跳過行直到找到標題行,此時您可以使用file.seek
方法將文件指針重置回開頭標題行,以便csv.DictReader
可以將文件的其余部分解析為有效的 CSV:
with open(file, 'r') as data_file:
while True:
position = data_file.tell()
line = next(data_file)
if line.count(',') == 3: # or whatever condition your header line satisfies
data_file.seek(position) # reset file pointer to the beginning of the header line
break
data_pipe = csv.DictReader(data_file)
...
我只想創建一個幫助函數來讓你的 csv 閱讀器進入第一條記錄:
def remove_comments_from_file():
file_name = "super_secret_file.csv"
file = open(file_name, 'rU')
csv_read_file = csv.reader(file)
for row in csv_read_file:
if row[0] == "PT [GEV]"
break
return csv_read_file
沿着這些路線,當返回 csv 閱讀器時,它將從您的第一條記錄開始(在本例中 - 67、62、72、6.68)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.