简体   繁体   English

python在excel中搜索俄语子字符串

[英]python searching Russian substrings in excel

I want to read excel file and extract some information concerning some people. 我想阅读Excel文件并提取一些有关某些人的信息。

Here is what i am doing 这是我在做什么

import xlrd
dir = './schfiles';
files = os.listdir(dir);
f = files[0];
book = xlrd.open_workbook(dir+"/"+files[0]);
sh = book.sheet_by_index(0)
t = sh.cell_value(rowx=xlr2i(35),colx=xlc2i('F'))
t.find(u"Усманов")

the string written in var t is u'\д\о\ц. 用var t编写的字符串是u'\\ u0434 \\ u043e \\ u0446。 \У\с\м\а\н\о\в \Б.\Ш.' \\ u0423 \\ u0441 \\ u043c \\ u0430 \\ u043d \\ u043e \\ u0432 \\ u0411。\\ u0428。” which looks like "доц. Усманов Б.Ш." 看起来像“доц。УсмановБ.Ш”。

u"Усманов" is represented as u'\\xd3\\xf1\\xec\\xe0\\xed\\xee\\xe2' u“Усманов”表示为u'\\ xd3 \\ xf1 \\ xec \\ xe0 \\ xed \\ xee \\ xe2'

i tried encoding both strings into 'utf8', decoding them, using external libs, but nothing helped. 我尝试将两个字符串都编码为'utf8',使用外部库对其进行解码,但没有任何帮助。

Does anyone know how is it possible to find a particular substring here? 有谁知道如何在这里找到特定的子字符串?

Use # -*- coding: utf-8 -*- as the first line of your script to tell the intepreter which encoding are you using. 使用# -*- coding: utf-8 -*-作为脚本的第一行,以告诉解释器您正在使用哪种编码。

# -*- coding: utf-8 -*-

import os
import xlrd

dir = './schfiles'
files = os.listdir(dir)
f = files[0]

workbook_path = os.path.join(dir, files[0])
book = xlrd.open_workbook(workbook_path)

sh = book.sheet_by_index(0)
t = sh.cell_value(rowx=xlr2i(35),colx=xlc2i('F'))
t.find(u"Усманов")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM