简体   繁体   中英

How do I prevent SQL injections in Flask-SQLAlchemy? Is there a better way to load data from CSV?

I am trying to quickly load data from the internet into a table using Flask, SQLAlchemy with PostgreSQL/psycopg2. I am in a slight argument with a co-worker. We will call him "Dad." Dad is arguing that we cannot execute a raw SQL query because of the possibility SQL injection. I am arguing that we can, if it is probably formatted, which is tough to do and the ORM should generally be used. The following example seems to me to be a simple enough issue.

# Flask-SQLAlachemy
from flask import Flask, render_template
from flask_sqlalchemy import SQLAlchemy

server = Flask(__name__)
server.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False
server.config['SQLALCHEMY_DATABASE_URI'] = 'postgresql+psycopg2://...'
db = SQLAlchemy(server)

@server.route('/<column>')
def index(column):
    result = db.session.execute("SELECT" + column + "from item_profit")
    return render_template('index.html', data=result)

Someone could essentially insert any Raw SQL into column and return that table. See this link for other simple injections. I saw this SO answer (see second answer) that seems to imply a properly formatted string will not lead to an SQL injection. Their code looks like so:

result = db.session.execute('SELECT * FROM my_table WHERE my_column = :val', {'val': 5})

Dad says that DBA at quite a few companies prevent other departments from having execute authority ie they would not be able to use

    result = db.session.execute("...")

I think that is because they are worried a regular user will not know to properly format it. Is that correct? It is not that executing a SQL query can cause an SQL injection but that an improperly formatted execute query can cause one. This is why people say to use the ORM because it ensures properly formatted queries all the time. Is this correct thinking? ORM systems are just performing the execute behind the scenes. If executing an query was an injection issue, then even ORMs could not be used. Heck, even SQL could not be used, correct? I hope to use raw SQL in certain cases, as we might be dealing with huge datasets, which processing speed is a major factor and the ORM is much, much slower.

My followup question then, what is the best way to load a CSV into a SQL database while minimizing raw SQL? Dad imported the CSV into postgres using a PgAdmin import (saves a lot of time having to write out the import in psql). I know another way would be to use pandas pandas.Dataframe.to_sql() method, but I do not see bringing pandas in for this seems like the best design here. I typically use the ORM, but I am not sure if there is a quick way of getting the CSV this way. Let's say I had a csv of a table, Product , with two columns, Name , and Cost . To make this in the ORM, I would do the following:

...

db = SQLAlchemy(app)

class Product(db.Model):
    __tablename__ = 'product'

    name = db.Column(db.String)
    cost = db.Column(db.Double)

However, I am back to the issue of getting the CSV into python to load into the table. There is pandas from_csv(''). I guess I could also open the file itself and loop through it. Something like

for record in csv_file:
    product = Product(name=..., cost=...)
    db.session.add(product)
    db.session.commit()

Is there a better way? Does the ORM have a way to load a CSV while allowing me to avoid raw SQL for most of the time?

It is a lot of questions.

@server.route('/<column>')
def index(column):
    result = db.session.execute("SELECT" + column + "from item_profit")
    return render_template('index.html', data=result)

This looks fool-proof enough if you limit what the column variable can contain. But the trouble is that you never know what happens with the code in future. All possible checks can go away and only this "SELECT " + column + " from..." will stay and then there might be a problem. That's why I would always argue against building sql and sending it into the database.

Dad says that DBA at quite a few companies prevent other departments from having execute authority ie they would not be able to use

I am not sure what exactly is meant by this. Some companies or some applications may prevent the database from running anything else than stored procedures. Therefore the database cannot be misused, and permissions are easier to check. But the database cannot prevent sqlalchemy from doing db.session.execute. db.session.execute can execute select, update, delete, stored procedure, any sql in general, and RDBMS will apply premissions to that sql. Whether you use raw sql, or your raw sql is built by ORM or any other builder does not matter.

Now, even SQLAlchemy has got 2-3 levels. ORM is the highest level. Then there is QL (Query Language) that will allow you build and execute SQL. Finally you can execute a literal SQL.

In the literal SQL you can use parameters, for example: https://docs.sqlalchemy.org/en/13/core/tutorial.html#specifying-bound-parameter-behaviors

In the QL you can easily build a query like:

q = table_name.insert({
    "id": 3321, "name": python_var_name, "age": python_var_age,
})
conn.execute(q)
# or even session.execute(q)  but consult the docs

In ORM you already know what to do. But as you want to insert a lot of records, you might be thinking about using bulk_insert_mapping, https://stackoverflow.com/a/35495327/3939992 but it will use all the ORM mechanism, which you probably do not want.

You will be better off with QL. Read the tutorial, it is all in there: https://docs.sqlalchemy.org/en/13/core/tutorial.html https://docs.sqlalchemy.org/en/13/core/tutorial.html#executing-multiple-statements

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM