簡體   English   中英

列表的 Unicode 問題。 無法在python中解決它

[英]Unicode issues with the list. Unable to resolve it in python

我正在通過以下方式使用 pandas 從網站中提取數據/數據框:

import pandas as pd

jockeys = 'https://race.kra.co.kr/globalEn/jockeysBusan.do'
jdf = pd.read_html(jockeys)[0]

jdf_list = jdf.values.tolist()
print(jdf_list)

我得到的結果如下(只添加前幾個結果):

[[1,
  'Chae Sang Hyun',
  'FREE',
  '2014/06/05',
  '262 (16/19/22)',
  '1789 (130/153/162)'],
 [2,
  'Choi Eun Gyeong',
  'FREE',
  '2016/06/18',
  '317 (19/22/38)',
  '1522 (90/120/140)'],
 [3,
  'Choi Si Dae',
  'FREE',
  '2007/05/18',
  '409 (58/34/34)',
  '5649 (750/658/594)'],
 [4,
  'Francisco Da Silva',
  'FREE',
  '2016/09/02',
  '375 (61/45/42)',
  '2255 (309/300/261)'],
 [5,
  '(-4)\xa0Gwon O Chan',
  'FREE',
  '2021/07/15',
  '154 (4/12/10)',
  '200 (4/14/10)']]

我一直在名字前得到這個“(-4)\xa0”。 我嘗試了以下幾種技術但沒有成功:

jdf_list_new =  jdf_list.encode('ascii', 'ignore').decode('utf-8')

jdf_list_new = unicodedata.normalize("NFKC", jdf_list)

在這里需要幫助!

\xa0Unicode 字符 'NO-BREAK SPACE' 在獲取列表之前,您需要對數據框中的列進行編碼和解碼( (-4)是網站中表格的一部分)

jdf = pd.read_html(jockeys)[0]
jdf['(allowance)Name'] = jdf['(allowance)Name'].str.encode('ascii', 'ignore').str.decode('utf-8')

輸出

[1, 'Chae Sang Hyun', 'FREE', '2014/06/05', '262 (16/19/22)', '1789 (130/153/162)']
[2, 'Choi Eun Gyeong', 'FREE', '2016/06/18', '317 (19/22/38)', '1522 (90/120/140)']
[3, 'Choi Si Dae', 'FREE', '2007/05/18', '409 (58/34/34)', '5649 (750/658/594)']
[4, 'Francisco Da Silva', 'FREE', '2016/09/02', '375 (61/45/42)', '2255 (309/300/261)']
[5, '(-4)Gwon O Chan', 'FREE', '2021/07/15', '154 (4/12/10)', '200 (4/14/10)']
[6, 'Jeon Jin Gu', 'FREE', '2017/06/02', '183 (2/12/7)', '914 (47/64/52)']
[7, 'Jeong Dong Cheol', 'FREE', '2011/08/24', '141 (6/3/4)', '2724 (169/183/195)']
[8, 'Jeong Woo Ju', 'FREE', '2018/06/14', '143 (2/6/8)', '987 (48/50/68)']
[9, 'Jo In Kwon', 'FREE', '2008/06/18', '355 (37/53/40)', '4592 (649/533/491)']
[10, 'Jung Do Yun', 'FREE', '2016/06/18', '260 (29/28/25)', '1921 (162/157/194)']
[11, 'Kim Cheol Ho', 'FREE', '2008/06/18', '164 (8/8/14)', '2640 (217/219/240)']
[12, 'Kim Eu Soo', 'FREE', '2005/05/04', '270 (11/13/14)', '4102 (243/306/344)']
[13, 'Kim Hye Sun', 'FREE', '2009/06/01', '415 (46/57/44)', '4275 (350/374/363)']
[14, '(-4)Lee Hong Rag', 'FREE', '2022/07/01', '91 (6/9/8)', '91 (6/9/8)']
[15, 'Lee Sung Jae', 'FREE', '2008/05/14', '396 (34/23/35)', '4244 (327/333/398)']
[16, 'Lim Sung Sil', 'FREE', '2002/09/13', '94 (5/8/14)', '2648 (353/296/279)']
[17, 'Mo Jun Ho', 'FREE', '2020/07/15', '340 (17/17/26)', '755 (45/54/64)']
[18, 'Park Jae I', 'FREE', '2015/06/17', '390 (62/52/50)', '2239 (167/223/227)']
[19, '(-4)Park Jong Ho', 'FREE', '2020/07/15', '74 (1/2/5)', '282 (8/7/14)']
[20, '(-2)Seo Gang Ju', 'FREE', '2021/07/15', '342 (28/41/40)', '385 (28/44/46)']
[21, 'Seo Seung Un', 'FREE', '2011/08/24', '368 (61/55/46)', '3973 (620/540/491)']
[22, '(-2)Shin Yun Seob', 'FREE', '2021/07/15', '313 (16/22/28)', '407 (24/26/38)']
[23, 'Song Kyeong Yun', 'FREE', '2007/05/18', '391 (39/34/40)', '4765 (361/450/461)']
[24, '(-3)Yoon Hyung Seok', 'FREE', '2021/07/15', '268 (13/19/23)', '317 (14/24/24)']
[25, 'You Hyun Myung', 'FREE', '2002/09/13', '387 (73/49/42)', '7104 (1199/940/750)']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM