简体   繁体   English

带有特殊字符的列表

[英]list with special characters

Using Python 2, I am saving strings from a variable (which is out of an xml tag) and am storing it into a list.使用 Python 2,我从一个变量(它在 xml 标记之外)中保存字符串并将其存储到一个列表中。

First: the strings contain special character, when I print them they don't correctly show up even that am using encode("ISO-8859-1")第一:字符串包含特殊字符,当我打印它们时,即使使用 encode("ISO-8859-1") 它们也不会正确显示

Second: The strings show up each one in a list and I want them to be in the same list第二:字符串在列表中显示每个字符串,我希望它们在同一个列表中

import lxml.objectify
from lxml import etree
import codecs
import xml.etree.cElementTree as ET
file_path = "C:\Users\HP\Downloads\Morphalou-2.0.xml"
for event, elem in ET.iterparse(file_path, events=("start", "end")):
    if elem.tag == 'orthography' and event =='start':
        data = elem.text
        my_list = []
        if data is not None :
            for i in data.split('\n'):
                my_list.append(i.encode("ISO-8859-1"))
            print (my_list)

This is what Am getting这就是我得到的

['abiotique']
['abiotiques']
[u'abi\xe9tac\xe9e']
[u'abi\xe9tac\xe9e']
[u'abi\xe9tac\xe9es']
[u'abi\xe9tin']
[u'abi\xe9tin']
[u'abi\xe9tins']
[u'abi\xe9tine']
[u'abi\xe9tines']

This is what am expecting:这是我所期待的:

['abiotique','abiotiques','abiétacée',...]

Does anyone know how to fix this ?有谁知道如何解决这一问题 ? Thanks谢谢

Python3 handles this automatically, you don't need to use encode . Python3 自动处理这个,你不需要使用encode
As for the list, you're creating a new one with each iteration, create it above the loop, and print it after iterating over the XML elements has finished.至于列表,每次迭代都会创建一个新列表,在循环上方创建它,并在对 XML 元素的迭代完成后打印它。

Working example (I've added the word abiétacée to an XML a bunch of times to reproduce your situation):工作示例(我已经多次将abiétacée一词abiétacée到 XML 中以重现您的情况):

my_list = []
for event, elem in ET.iterparse(file_path, events=("start", "end")):
    if elem.tag == 'orthography' and event =='start':
        data = elem.text
        if data is not None :
            for i in data.split('\n'):
                my_list.append(i)
print (my_list)

outputs产出

['abiétacée', 'abiétacée', 'abiétacée', 'abiétacée'] ['abiétacée', 'abiétacée', 'abiétacée', 'abiétacée']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM