简体   繁体   中英

importing csv file with following structure into SQLite

I made a post earlier ( Getting excel data into Database - beginner ) about getting data into SQlite.

I have done some further research and now understand the basics, therefore I have created the following code:

import sqlite3

conn = sqlite3.connect('financials.db')

cur = conn.cursor()

cur.execute('DROP TABLE IF EXISTS financials')
cur.execute('''
CREATE TABLE "financials"(
    "Mkt_Cap" REAL,
    "EV" REAL,
    "PE" REAL,
    "Yield" REAL
)
''')

fname = input('Enter the name of the csv file:')
if len(fname) < 1 : fname="data.csv"

with open(fname) as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    for row in csv_reader:
        print(row)

Below is how my CSV data is currently formatted (It just gets scrapped and put into a CSV file):

在此处输入图像描述

Given that, would I be able to extract the values of the table rows using something like this:

Mkt_cap=row[0]
EV = row[1]

I would then write an Insert command and commit to get the data into the database.

Or do I need to reformat my CSV data?

It is a bit tricky because the data in the CSV are transposed. Usually you would have each row defining a year and columns be fiscal period, capitalization, ev, etc.

You could transpose the data yourself but I would use pandas for that. Assuming your csv looks as such based on your screenshot:

Valuation,,,,,,
Fiscal Period: December,2017,2018,2019,2020,2021,2022
Capitalization,270120,215323,248119,-,-
Entreprise Value (EV),262351,208330,232655,204634,200604,196917
P/E ratio,25.7x,16.0x,19.1x,67.1x,19.6x,15.3x
Yield,0.94%,1.83%,1.59%,0.83%,1.54%,1.74%

Here some example code:

import pandas as pd

df = pd.read_csv('data.csv', headers=None, na_values='-')

# first row does not mean much so let us remove it
df = df.drop(df.index[0])

# transpose the data to get it back in shape
df = df.transpose()

# use first row as header
df.columns = df.iloc[0]
# remove first row from data
df = df.drop(df.index[0])

# iterate over each row
for _, row in  df.iterrows():
    print(f'cap: {row["Capitalization"]}\t'
          f'EV: {row["Entreprise Value (EV)"]}\t'
          f'PE: {row["P/E ratio"]}\t'
          f'Yield: {row["Yield"]}')

result:

cap: 270120 EV: 262351  PE: 25.7x   Yield: 0.94%
cap: 215323 EV: 208330  PE: 16.0x   Yield: 1.83%
cap: 248119 EV: 232655  PE: 19.1x   Yield: 1.59%
cap: 237119 EV: 204634  PE: 67.1x   Yield: 0.83%
cap: nan    EV: 200604  PE: 19.6x   Yield: 1.54%
cap: nan    EV: 196917  PE: 15.3x   Yield: 1.74%

You may want to change your format first.

Currently you have labels on left and going down. The Machine is look for the labels from left to right.

Think also about the Sort Method and looking for an index, would it be easiest to retrieve the column year or would it be best to have it go index to index until it hits a year.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM