簡體   English   中英

基於Python / MySQL的管道中的字符編碼問題

[英]Character Encoding Issue in Python/MySQL Based Pipeline

我在使用以下組件開發的應用程序上無法正常使用編碼/解碼:

  1. Python 3.6
  2. 美麗湯
  3. 使用UTF-8廢棄的網頁
  4. 的MySQL
  5. json
  6. 拉姆達

當我將數據放入前端(Alexa)時,在某些情況下,它包含Unicode字符(例如\\ u00e2 \\ u0080 \\ u0099)。 任何幫助將不勝感激!!!

以下是整個管道中的代碼段:

原始網頁為:在Chrome開發者工具中選中了document.characterSet 網頁字符編碼

我正在使用此Python / BeautifulSoup代碼:

from bs4 import BeautifulSoup
import pymysql
    if page_response.status_code == 200:
        page_content = BeautifulSoup(page_response.content, "html.parser")    
        if str(page_content.find(attrs={'id': 'main'})).find(page_test) != -1:
            for table_row in page_content.select("div#page_filling_chart center table tr"):
                cells = table_row.findAll('td')
                if cells:
                    records += 1
                    bo_entry.title = cells[2].text.strip()

使用以下命令將數據放入數據庫:

connection = pymysql.connect(
        host=rds_host,
        user=name,
        password=password,
        db=db_name
        )
    try:
        with connection.cursor() as cursor:
            # UPSERT: https://chartio.com/resources/tutorials/how-to-insert-if-row-does-not-exist-upsert-in-mysql/
            sql = (
                    f"REPLACE INTO weekend_box_office(weekend_date, market, title_id, title,gross,total_gross,rank_order, previous_rank, distributor, distributor_id, change_pct, theaters, per_theater, week_in_release, gross_num, total_gross_num)"
                    f"VALUE(%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s);"
                )
            data = (
                    bo_entry.weekend, bo_entry.market, bo_entry.title_id, bo_entry.title, bo_entry.gross, bo_entry.total_gross, 
                    bo_entry.rank, bo_entry.previous_rank, bo_entry.distributor, bo_entry.distributor_id, bo_entry.change_pct, bo_entry.theaters,
                    bo_entry.per_theater, bo_entry.weeks_in_release, bo_entry.gross_num, bo_entry.total_gross_num
                    )
#            print(sql)

當前數據庫的排序規則和字符集設置為: 數據庫字符集

存儲數據的MySQL表排序規則是這樣的: 表格字符編碼

我使用以下Python 3.6代碼從數據庫中獲取數據:

connection = pymysql.connect(
        host=rds_host,
        user=name,
        password=password,
        db=db_name
        )

        with connection.cursor() as cursor:
            sql = (
                    f"select weekend_date, title_id, title, gross, gross_num, total_gross, total_gross_num, CONCAT(cast(ROUND(gross_num / total_gross_num * 100,1) as CHAR),'%') as weekend_pct, week_in_release "
                    "from weekend_box_office "
                    f"where weekend_date = '{weekend_text}' "
                    f"order by gross_num desc limit {limit_row_no}; "
                )
            try:
                cursor.execute(sql)
                result = cursor.fetchall()              
                for row in result:
                    title = row[2]

這是當我在Spyder的Variable Explorer中放置一個斷點並對其進行檢查時的外觀。 變量瀏覽器

當我退還它時,它看起來像這樣: 返回為Str

使用以下代碼:response_text + =(f“由{title}領導,引入$ {SpeechUtils.spoken_human_format(gross_num)}。”)返回response_text

當我使用json Python庫從Lambda返回它時,它看起來像這樣: JSON模塊的輸出 return {'statusCode':200,'body':json.dumps(speak_top5(BoxOffice.get_previous_friday()),'headers':{'Content-Type':'application / json','Access-Control-Allow-原點':'*'},

將mysql連接charset更改為charset='utf8'后,請嘗試。

connection = pymysql.connect(
    host=rds_host,
    user=name,
    password=password,
    db=db_name,
    charset='utf8'
    )

這里查看詳細信息

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM