简体   繁体   English

如何仅在使用 Python 找到特定模式后才能读取 csv 文件?

[英]How can I read csv file only after finding a certain pattern with Python?

So I have several csv files that represent some data, each of which may have different lines of initial comments所以我有几个代表一些数据的csv文件,每个文件可能有不同的初始注释行

table_doi: 10.17182/hepdata.52402.v1/t7
name: Table 7
...
ABS(YRAP), < 0.1
SQRT(S) [GeV], 1960
PT [GEV], PT [GEV] LOW, PT [GEV] HIGH, D2(SIG)/DYRAP/DPT [NB/GEV]
67, 62, 72, 6.68
...
613.5, 527, 700, 1.81E-07

I would like to read in only the relevant data and their headers as well, which start from the line我只想读入相关数据及其标题,从行开始

PT [GEV], PT [GEV] LOW, PT [GEV] HIGH, D2(SIG)/DYRAP/DPT [NB/GEV]

Therefore the strategy I would think of is to find the pattern PT [GEV] and start reading from there.因此,我想到的策略是找到模式PT [GEV]并从那里开始阅读。

However, I am not sure how to achieve this in Python, could anyone help me on that?但是,我不确定如何在 Python 中实现这一点,有人可以帮助我吗?

Thank you in advance!先感谢您!


By the way, the function I currently have is顺便说一句,我目前拥有的功能是

import os
import glob
import csv

def read_multicolumn_csv_files_into_dictionary(folderpath, dictionary):
    filepath = folderpath + '*.csv'
    files = sorted(glob.glob(filepath))
    for file in files:
        data_set = file.replace(folderpath, '').replace('.csv', '')
        dictionary[data_set] = {}
        with open(file, 'r') as data_file:
            data_pipe = csv.DictReader(data_file)
            dictionary[data_set]['pt'] = []
            dictionary[data_set]['sigma'] = []
            for row in data_pipe:
                dictionary[data_set]['pt'].append(float(row['PT [GEV]']))
                dictionary[data_set]['sigma'].append(float(row['D2(SIG)/DYRAP/DPT [NB/GEV]']))
    return dictionary

which only works if I manually delete those initial comments in the csv files.仅当我手动删除 csv 文件中的那些初始注释时才有效。

checkout startswith .结帐startswith . Also, you can find detailed explanation here.此外,您可以在此处找到详细说明。 https://cmdlinetips.com/2018/01/3-ways-to-read-a-file-and-skip-initial-comments-in-python/ https://cmdlinetips.com/2018/01/3-ways-to-read-a-file-and-skip-initial-comments-in-python/

Assuming every file has a line that startswith PT [GEV] :假设每个文件都有一行以PT [GEV]开头:

import os
import pandas as pd

...
csvs = []
for file in files:
    with open(file) as f:
        for i, l in enumerate(f):
            if l.startswith('PT [GEV]'):
                csvs.append(pd.read_csv(file, skiprows = i))
                break
df = pd.concat(csvs)

Try this where it will be searching for the row that contains PT [GEV] and if it finds the contains, it will change the m to be true and start to append the rest of date to the list :试试这个,它将搜索包含PT [GEV]的行,如果找到包含,它会将m更改为 true 并开始将其余日期附加到列表中:

import csv

contain= 'PT [GEV]'
List=[]
m=false
with open('Users.csv', 'rt') as f:
     reader = csv.reader(f, delimiter=',') 
     for row in reader:
          for field in row:
              if field == contain:
              m=true
          if m==true:
             List.append(row)            

You can use the file.tell method to save the file pointer position while you read and skip the lines until you find the header line, at which point you can use the file.seek method to reset the file pointer back to the beginning of the header line so that csv.DictReader can parse the rest of the file as valid CSV:您可以使用file.tell方法在读取时保存文件指针位置并跳过行直到找到标题行,此时您可以使用file.seek方法将文件指针重置回开头标题行,以便csv.DictReader可以将文件的其余部分解析为有效的 CSV:

with open(file, 'r') as data_file:
    while True:
        position = data_file.tell()
        line = next(data_file)
        if line.count(',') == 3: # or whatever condition your header line satisfies
            data_file.seek(position) # reset file pointer to the beginning of the header line
            break
    data_pipe = csv.DictReader(data_file)
    ...

I would just create a help function to get your csv reader to the first record:我只想创建一个帮助函数来让你的 csv 阅读器进入第一条记录:

def remove_comments_from_file():

    file_name = "super_secret_file.csv"
    file = open(file_name, 'rU')

    csv_read_file = csv.reader(file)        

    for row in csv_read_file:
        if row[0] == "PT [GEV]"
            break

    return csv_read_file

Something along those lines, when the csv reader is returned, it will start at your first record (in this example - 67, 62, 72, 6.68)沿着这些路线,当返回 csv 阅读器时,它将从您的第一条记录开始(在本例中 - 67、62、72、6.68)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 仅使用python读取csv文件中的某些行 - Only read certain rows in a csv file with python 如何使用 python 仅读取 a.csv 中特定范围的行? - How can I use python to read only a certain range of lines in a .csv? 如何从 a.txt 文件中读取某些字符并将它们写入 Python 中的 a.csv 文件? - How can I read certain characters from a .txt file and write them to a .csv file in Python? 仅当变量等于某个值(20Gb+ csv 文件)时,如何从 csv 文件中读取行 - How can I read lines from a csv file only if a variable equals to a certain value (20Gb+ csv file) Python:如何在CSV文件中求和,而仅求和某个变量的整数? - Python: How can I sum integers in a CSV file, while only summing the integers of a certain variable? 在python中,如何将列表中的某些值仅写入CSV文件? - In python, how can I write only certain values from a list to a CSV file? 如何使用 Python 仅读取 CSV 文件的标题列? - How can I read only the header column of a CSV file using Python? Python:使用Excel CSV文件只读取某些列和行 - Python: Using Excel CSV file to read only certain columns and rows 如何在 python 中读取和排序 csv 文件? - How can I read and sort a csv file in python? 如何快速将大型CSV文件读入Python? - How can I read a large CSV file into Python with speed?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM