[英]Character Encoding Issue in Python/MySQL Based Pipeline
我在使用以下組件開發的應用程序上無法正常使用編碼/解碼:
當我將數據放入前端(Alexa)時,在某些情況下,它包含Unicode字符(例如\\ u00e2 \\ u0080 \\ u0099)。 任何幫助將不勝感激!!!
以下是整個管道中的代碼段:
原始網頁為:在Chrome開發者工具中選中了document.characterSet
我正在使用此Python / BeautifulSoup代碼:
from bs4 import BeautifulSoup
import pymysql
if page_response.status_code == 200:
page_content = BeautifulSoup(page_response.content, "html.parser")
if str(page_content.find(attrs={'id': 'main'})).find(page_test) != -1:
for table_row in page_content.select("div#page_filling_chart center table tr"):
cells = table_row.findAll('td')
if cells:
records += 1
bo_entry.title = cells[2].text.strip()
使用以下命令將數據放入數據庫:
connection = pymysql.connect(
host=rds_host,
user=name,
password=password,
db=db_name
)
try:
with connection.cursor() as cursor:
# UPSERT: https://chartio.com/resources/tutorials/how-to-insert-if-row-does-not-exist-upsert-in-mysql/
sql = (
f"REPLACE INTO weekend_box_office(weekend_date, market, title_id, title,gross,total_gross,rank_order, previous_rank, distributor, distributor_id, change_pct, theaters, per_theater, week_in_release, gross_num, total_gross_num)"
f"VALUE(%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s);"
)
data = (
bo_entry.weekend, bo_entry.market, bo_entry.title_id, bo_entry.title, bo_entry.gross, bo_entry.total_gross,
bo_entry.rank, bo_entry.previous_rank, bo_entry.distributor, bo_entry.distributor_id, bo_entry.change_pct, bo_entry.theaters,
bo_entry.per_theater, bo_entry.weeks_in_release, bo_entry.gross_num, bo_entry.total_gross_num
)
# print(sql)
我使用以下Python 3.6代碼從數據庫中獲取數據:
connection = pymysql.connect(
host=rds_host,
user=name,
password=password,
db=db_name
)
with connection.cursor() as cursor:
sql = (
f"select weekend_date, title_id, title, gross, gross_num, total_gross, total_gross_num, CONCAT(cast(ROUND(gross_num / total_gross_num * 100,1) as CHAR),'%') as weekend_pct, week_in_release "
"from weekend_box_office "
f"where weekend_date = '{weekend_text}' "
f"order by gross_num desc limit {limit_row_no}; "
)
try:
cursor.execute(sql)
result = cursor.fetchall()
for row in result:
title = row[2]
這是當我在Spyder的Variable Explorer中放置一個斷點並對其進行檢查時的外觀。
使用以下代碼:response_text + =(f“由{title}領導,引入$ {SpeechUtils.spoken_human_format(gross_num)}。”)返回response_text
當我使用json Python庫從Lambda返回它時,它看起來像這樣: return {'statusCode':200,'body':json.dumps(speak_top5(BoxOffice.get_previous_friday()),'headers':{'Content-Type':'application / json','Access-Control-Allow-原點':'*'},
將mysql連接charset更改為charset='utf8'
后,請嘗試。
connection = pymysql.connect(
host=rds_host,
user=name,
password=password,
db=db_name,
charset='utf8'
)
從這里查看詳細信息
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.