简体   繁体   English

如何使用python将.txt文件转换为xml文件?

[英]How to convert .txt file into xml file using python?

Latitude :23.1100348
Longitude:72.5364922
date&time :30:August:2014 05:04:31 PM
gsm cell id: 4993
Neighboring List- Lac : Cid : RSSI
15000     :    7072     :    25 dBm
15000     :    7073     :    23 dBm
15000     :    6102     :    24 dBm
15000     :    6101     :    24 dBm
15000     :    6103     :    17 dBm

Latitude :23.1120549
Longitude:72.5397988
date&time :30:August:2014 05:04:34 PM
gsm cell id: 4993
Neighboring List- Lac : Cid : RSSI
15000     :    7072     :    24 dBm
15000     :    7073     :    22 dBm
15000     :    6102     :    23 dBm
15000     :    6101     :    23 dBm
15000     :    2552     :    16 dBm

This is my.txt file I want convert it into xml file like 这是my.txt文件我希望将其转换为xml文件之类的

<celldata>
<time>        </time>
<latitude>    </latitude>
<longitude>   </longitude>

</celldata>

I tried to make list of all components but I didn't get o/pI want to store all values of latitude,longitude,gsm cell id,time in list and this will add in xml file something like that. 我试图列出所有组件,但我没有得到o / pI想要存储纬度,经度,gsm单元格id,时间列表中的所有值,这将在xml文件中添加类似的东西。 I write below code. 我写下面的代码。

import re

pa = 'Longitude|Latitude|gsm cell id|Neighboring List- Lac : Cid : RSSI'

with open('cell.txt','rw') as file:
    for line in file:
        line.strip()    
        if re.search(pa, line):
            lineInfo = line.split(':')
            title = lineInfo[0]
            value = lineInfo[1]

Try the following code as a starter: 尝试使用以下代码作为启动器:

#!python3

import re
import xml.etree.ElementTree as ET

rex = re.compile(r'''(?P<title>Longitude
                       |Latitude
                       |date&time
                       |gsm\s+cell\s+id
                     )
                     \s*:?\s*
                     (?P<value>.*)
                     ''', re.VERBOSE)

root = ET.Element('root')
root.text = '\n'    # newline before the celldata element

with open('cell.txt') as f:
    celldata = ET.SubElement(root, 'celldata')
    celldata.text = '\n'    # newline before the collected element
    celldata.tail = '\n\n'  # empty line after the celldata element
    for line in f:
        # Empty line starts new celldata element (hack style, uggly)
        if line.isspace():
            celldata = ET.SubElement(root, 'celldata')
            celldata.text = '\n'
            celldata.tail = '\n\n'

        # If the line contains the wanted data, process it.
        m = rex.search(line)
        if m:
            # Fix some problems with the title as it will be used
            # as the tag name.
            title = m.group('title')
            title = title.replace('&', '')
            title = title.replace(' ', '')

            e = ET.SubElement(celldata, title.lower())
            e.text = m.group('value')
            e.tail = '\n'

# Display for debugging            
ET.dump(root)

# Include the root element to the tree and write the tree
# to the file.
tree = ET.ElementTree(root)
tree.write('cell.xml', encoding='utf-8', xml_declaration=True)

It displays for your example data: 它显示您的示例数据:

<root>
<celldata>
<latitude>23.1100348</latitude>
<longitude>72.5364922</longitude>
<datetime>30:August:2014 05:04:31 PM</datetime>
<gsmcellid>4993</gsmcellid>
</celldata>

<celldata>
<latitude>23.1120549</latitude>
<longitude>72.5397988</longitude>
<datetime>30:August:2014 05:04:34 PM</datetime>
<gsmcellid>4993</gsmcellid>
</celldata>

</root>

Update for the wanted neigbour list: 想要的neigbour列表的更新:

#!python3

import re
import xml.etree.ElementTree as ET

rex = re.compile(r'''(?P<title>Longitude
                       |Latitude
                       |date&time
                       |gsm\s+cell\s+id
                       |Neighboring\s+List-\s+Lac\s+:\s+Cid\s+:\s+RSSI
                     )
                     \s*:?\s*
                     (?P<value>.*)
                     ''', re.VERBOSE)

root = ET.Element('root')
root.text = '\n'    # newline before the celldata element

with open('cell.txt') as f:
    celldata = ET.SubElement(root, 'celldata')
    celldata.text = '\n'    # newline before the collected element
    celldata.tail = '\n\n'  # empty line after the celldata element
    for line in f:
        # Empty line starts new celldata element (hack style, uggly)
        if line.isspace():
            celldata = ET.SubElement(root, 'celldata')
            celldata.text = '\n'
            celldata.tail = '\n\n'
        else:
            # If the line contains the wanted data, process it.
            m = rex.search(line)
            if m:
                # Fix some problems with the title as it will be used
                # as the tag name.
                title = m.group('title')
                title = title.replace('&', '')
                title = title.replace(' ', '')

                if line.startswith('Neighboring'):
                    neighbours = ET.SubElement(celldata, 'neighbours')
                    neighbours.text = '\n'
                    neighbours.tail = '\n'
                else:
                    e = ET.SubElement(celldata, title.lower())
                    e.text = m.group('value')
                    e.tail = '\n'
            else:
                # This is the neighbour item. Split it by colon,
                # and set the attributes of the item element.
                item = ET.SubElement(neighbours, 'item')
                item.tail = '\n'

                lac, cid, rssi = (a.strip() for a in line.split(':'))
                item.attrib['lac'] = lac
                item.attrib['cid'] = cid
                item.attrib['rssi'] = rssi.split()[0] # dBm removed

# Include the root element to the tree and write the tree
# to the file.
tree = ET.ElementTree(root)
tree.write('cell.xml', encoding='utf-8', xml_declaration=True)

Update for accepting empty line before neighbours -- also better implementation for general purposes: 在邻居之前接受空行的更新 - 也是为了一般目的更好的实现:

#!python3

import re
import xml.etree.ElementTree as ET

rex = re.compile(r'''(?P<title>Longitude
                       |Latitude
                       |date&time
                       |gsm\s+cell\s+id
                       |Neighboring\s+List-\s+Lac\s+:\s+Cid\s+:\s+RSSI
                     )
                     \s*:?\s*
                     (?P<value>.*)
                     ''', re.VERBOSE)

root = ET.Element('root')
root.text = '\n'    # newline before the celldata element

with open('cell.txt') as f:
    celldata = ET.SubElement(root, 'celldata')
    celldata.text = '\n'    # newline before the collected element
    celldata.tail = '\n\n'  # empty line after the celldata element
    status = 0              # init status of the finite automaton
    for line in f:
        if status == 0:     # lines of the heading expected
            # If the line contains the wanted data, process it.
            m = rex.search(line)
            if m:
                # Fix some problems with the title as it will be used
                # as the tag name.
                title = m.group('title')
                title = title.replace('&', '')
                title = title.replace(' ', '')

                if line.startswith('Neighboring'):
                    neighbours = ET.SubElement(celldata, 'neighbours')
                    neighbours.text = '\n'
                    neighbours.tail = '\n'
                    status = 1  # empty line and then list of neighbours expected
                else:
                    e = ET.SubElement(celldata, title.lower())
                    e.text = m.group('value')
                    e.tail = '\n'
                    # keep the same status

        elif status == 1:   # empty line expected
            if line.isspace():
                status = 2  # list of neighbours must follow
            else:
                raise RuntimeError('Empty line expected. (status == {})'.format(status))
                status = 999 # error status

        elif status == 2:   # neighbour or the empty line as final separator

            if line.isspace():
                celldata = ET.SubElement(root, 'celldata')
                celldata.text = '\n'
                celldata.tail = '\n\n'
                status = 0  # go to the initial status
            else:
                # This is the neighbour item. Split it by colon,
                # and set the attributes of the item element.
                item = ET.SubElement(neighbours, 'item')
                item.tail = '\n'

                lac, cid, rssi = (a.strip() for a in line.split(':'))
                item.attrib['lac'] = lac
                item.attrib['cid'] = cid
                item.attrib['rssi'] = rssi.split()[0] # dBm removed
                # keep the same status

        elif status == 999: # error status -- break the loop
            break

        else:
            raise LogicError('Unexpected status {}.'.format(status))
            break

# Display for debugging
ET.dump(root)

# Include the root element to the tree and write the tree
# to the file.
tree = ET.ElementTree(root)
tree.write('cell.xml', encoding='utf-8', xml_declaration=True)

The code implements so called finite automaton where the status variable represents its current status. 该代码实现了所谓的有限自动机 ,其中status变量表示其当前状态。 You can visualize it using pencil and paper -- draw small circles with the status numbers inside (called nodes in the graph theory). 您可以使用铅笔和纸张对其进行可视化 - 绘制内部状态编号的小圆圈(在图论中称为节点)。 Being at the status, you allow only some kind of input ( line ). 处于状态时,您只允许某种输入( line )。 When the input is recognized, you draw the arrow (oriented edge in the graph theory) to another status (possibly to the same status, as a loop returning back to the same node). 当识别输入时,您将箭头(图论中的方向边)绘制到另一个状态(可能是相同的状态,作为返回到同一节点的循环)。 The arrow is annotated `condition | 箭头标注为`condition | action'. 行动'。

The result may look complex at the beginning; 结果可能在开始时看起来很复杂; however, it is easy in the sense that you can always focus ony on the part of the code that belongs to certain status. 但是,从某种意义上说,您可以始终专注于属于某种状态的代码部分。 And also, the code can be easily modified. 而且,代码可以很容易地修改。 However, finite automatons have limited power. 然而,有限自动机的功率有限。 But they are just perfect for this kind of problems. 但它们只适合这类问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM