简体   繁体   中英

How to Read csv file upto a certain row and store in a variable

I want to read in a csv file and then store the data under the headers as specific variables.

Mys csv file:

multiplicity  
4.123  
lattice parameters  
1,0,0  
0,1,0  
0,0,1  
atom sites  
0,0,0  
0.5,0.5,0.5  
occupancy  
1,0  
0,1  

I want to create a code than can automatically store the line under the multiplicity as data for the variable and so on for the rest of csv. I can't hard code values such as multiplicity is line[2] in the csv because the number of lines for each will change. I would like to create a loop that stores the data in between headers as a variable but I am not sure how.

Ideally I would want the code to search for the first header and the second header and then save the values in between as the multiplicity variable. Then I would want it to find the second header and the third header and save those values as lattice parameter. Find the third header and the fourth header and the values in between as atom sites. And finally find the fourth header and the end of the csv and save the values in between as occupancy.

You could try collecting your rows in a collections.defaultdict() .

As for grouping lines to their respective headers, it seems that you can just check if a line has all letters and spaces, and is one item read by csv.reader() . It's difficult to say since you've only shown a snapshot of your data. I've made these assumptions in the example below. After you have identified how you find the headers, you can simply add all the proceeding rows until another header is found.

I've also assumed that your normal rows contain integers and floats. You can convert them directly to their proper types with ast.literal_eval() .

Demo:

from csv import reader
from collections import defaultdict
from ast import literal_eval
from pprint import pprint

# Create a dictionary of lists
data = defaultdict(list)

# Open your file
with open('data.csv') as f:

    # Get the csv reader
    csv_reader = reader(f)

    # Initialise current header
    # If rows fall under this header, they don't have a header
    current_header = None

    # Go over each line in the csv file
    for line in csv_reader:

        # Found header
        if len(line) == 1 and all(item.isalpha() or item.isspace() for item in line[0]):
            current_header = line[0]
            continue

        #  If we get here, normal line with ints and floats
        data[current_header].append(list(map(literal_eval, line)))

pprint(data)

Output:

defaultdict(<class 'list'>,
            {'atom sites': [[0, 0, 0], [0.5, 0.5, 0.5]],
             'lattice parameters': [[1, 0, 0], [0, 1, 0], [0, 0, 1]],
             'multiplicity': [[4.123]],
             'occupancy': [[1, 0], [0, 1]]})

And now you have a dictionary that stores each header with its respective rows. This can be manipulated later, and added to if needed.

Here is an example of printing each header and their respective rows(nested list):

for header, rows in data.items():
    print("Header: %s, Rows: [%s]" % (header, ",".join(map(str, rows))))

# Header: multiplicity, Rows: [[4.123]]
# Header: lattice parameters, Rows: [[1, 0, 0],[0, 1, 0],[0, 0, 1]]
# Header: atom sites, Rows: [[0, 0, 0],[0.5, 0.5, 0.5]]
# Header: occupancy, Rows: [[1, 0],[0, 1]]

You can also have a look at How to use dictionaries in Python to understand more about dictionaries and how to manipulate them.

My $0.02:

  • Your approach listed in the question is unecessarily complex. You don't need to identify the first and second heading and append the data between. You need:
    1. A way to identify if you have hit a header
    2. Code that will deal appropriately with the values after the header

This isn't working code, but you probably need something from the python csv module , which may look something like this (RoadRunner's code is more complete, but I think we're both going along the same lines and would end up with pretty much identical output).

data_dict = {}

import csv
with open('file_name.csv', newline='') as csvfile:
     csvreader = csv.reader(csvfile, delimiter=',')
     curr_header = "IF THIS IN DICT, SOMETHING IS WRONG"
     for row in csvreader:
         try: # look for header, if not header, append data
             float(row[0])
             data_dict[curr_header].append([float(x) for x in row])
         except ValueError: # found a header
             curr_header = row[0]
             data_dict[curr_header] = []

print(data_dict)
import re

data = {}
lines = list(open("data.csv", 'r'))

for line in lines:
    check = line.split(",")[0].strip()
    if not re.match("^-?\d+\.?\d*$", check):
        key = check
    else:
        data[key] =  data.get(key, []) + [[float(x) for x in line.split(",")]]

and data dict looks like:

{'atom sites': [[0.0, 0.0, 0.0], [0.5, 0.5, 0.5]],
 'lattice parameters': [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]],
 'multiplicity': [[4.123]],
 'occupancy': [[1.0, 0.0], [0.0, 1.0]]}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM