简体   繁体   English

基本的python; '在文本变量中停止我的脚本; psycopg&tweepy; python,postgres和Twitter

[英]Basic python; ' in text variable which stops my script; psycopg&tweepy; python, postgres and twitter

I have a script that mines tweets and inputs them into my postgres database. 我有一个脚本可以挖掘推文并将其输入到我的postgres数据库中。 It works for most messages 它适用于大多数消息

With the following line I can return the text of a message: 在下面的行中,我可以返回消息的文本:

tweet.text.encode('utf-8')

Whenever the tweet has a ' in the text my script stops. 每当推文中的文字中有'时,我的脚本就会停止。 I could make a function that extracts the tweet and puts it within two ". But I figured I will get the same problem when a tweet contains a ". 我可以制作一个提取推文并将其放入两个“。”的函数。但是我认为,当一个推文包含“”时,我也会遇到同样的问题。 Then I could make a function that checks tweets on containing a ' or " and catch these statements off. But it seems way to much work for this simple problem. 然后,我可以创建一个函数来检查包含'或'的推文,并捕获这些语句。但这似乎可以解决这个简单的问题。

So i'd like to know how to overcome this problem without to much scripting effort. 因此,我想知道如何在无需大量脚本编写工作的情况下克服此问题。

I am not an expert in python and one of the things that is my problem is that I try to fix things in a difficult way while there often is a much simpler way. 我不是python专家,而我的问题之一就是我尝试以一种困难的方式来修复它,而通常有一种更简单的方法。 The current problem made me think this is a scenario like that. 当前的问题使我认为这是一个类似的情况。 Hence, my question here. 因此,我的问题在这里。

*** UPDATE ***更新

My error pops up when inserting the message into my postgres table indeed. 将消息确实插入到我的postgres表中时,我的错误弹出。

I just tried repr() but still got a similar error message. 我只是尝试repr(),但仍然收到类似的错误消息。

Traceback (most recent call last):
  File "...python.py", line 28, in <module>
    cur.execute("INSERT INTO Test(userid, created, retweets, message) VALUES('{0}', '{1}', '{2}', '{3}')".format(tweet.user.id, tweet.created_at, tweet.retweet_count, ber))
psycopg2.ProgrammingError: syntax error at or near "E19"
LINE 1: ...LUES('1251822199', '2016-02-27 10:23:40', '0', 'b'E19 (A1) M...

The 4th parameter is the text of the tweet and starts with 'b'E19 as text. 第四个参数是推文的文本,并以'b'E19作为文本开头。 It fails here. 它在这里失败。

The line I use to input the data into postgres is the following: 我用于将数据输入到postgres的行如下:

cur.execute("INSERT INTO Test(message) VALUES('{0}')".format(repr(tweet.text.encode('utf-8'))))

Because you are manually creating the query with string operations, you would need to escape the quotes in the query. 由于您是通过字符串操作手动创建查询的,因此您需要对查询中的引号进行转义。

But a better way is to use parameterised queries and allow psycopg2 to perform escaping of special characters. 但是更好的方法是使用参数化查询,并允许psycopg2执行特殊字符的转义。 This will also make your code less vulnerable to SQL injection attacks if some of the parameters are from untrusted sources, eg a user. 如果某些参数来自不受信任的来源(例如用户),这也将使您的代码不易受到SQL注入攻击的攻击。

cur.execute("INSERT INTO Test(message) VALUES(%s)", (tweet.text.encode('utf-8'),))

or 要么

cur.execute("INSERT INTO Test(userid, created, retweets, message) VALUES(%s, %s, %s, %s)", (tweet.user.id, tweet.created_at, tweet.retweet_count, ber))

Now the DB layer will perform escaping for you. 现在,数据库层将为您执行转义。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM