如何使用python将.txt文件转换为xml文件?

发布于 2021-01-29 16:19:24

Latitude :23.1100348
Longitude:72.5364922
date&time :30:August:2014 05:04:31 PM
gsm cell id: 4993
Neighboring List- Lac : Cid : RSSI
15000     :    7072     :    25 dBm
15000     :    7073     :    23 dBm
15000     :    6102     :    24 dBm
15000     :    6101     :    24 dBm
15000     :    6103     :    17 dBm

Latitude :23.1120549
Longitude:72.5397988
date&time :30:August:2014 05:04:34 PM
gsm cell id: 4993
Neighboring List- Lac : Cid : RSSI
15000     :    7072     :    24 dBm
15000     :    7073     :    22 dBm
15000     :    6102     :    23 dBm
15000     :    6101     :    23 dBm
15000     :    2552     :    16 dBm

这是my.txt文件,我想将其转换为xml文件,例如

<celldata>
<time>        </time>
<latitude>    </latitude>
<longitude>   </longitude>

</celldata>

我试图列出所有组件,但我没有得到O / P,我想将纬度,经度,gsm单元格ID,时间的所有值存储在列表中,这将在xml文件中添加类似内容。我写下面的代码。

import re

pa = 'Longitude|Latitude|gsm cell id|Neighboring List- Lac : Cid : RSSI'

with open('cell.txt','rw') as file:
    for line in file:
        line.strip()    
        if re.search(pa, line):
            lineInfo = line.split(':')
            title = lineInfo[0]
            value = lineInfo[1]
关注者
0
被浏览
153
1 个回答
  • 面试哥
    面试哥 2021-01-29
    为面试而生,有面试问题,就找面试哥。

    尝试以下代码作为入门:

    #!python3
    
    import re
    import xml.etree.ElementTree as ET
    
    rex = re.compile(r'''(?P<title>Longitude
                           |Latitude
                           |date&time
                           |gsm\s+cell\s+id
                         )
                         \s*:?\s*
                         (?P<value>.*)
                         ''', re.VERBOSE)
    
    root = ET.Element('root')
    root.text = '\n'    # newline before the celldata element
    
    with open('cell.txt') as f:
        celldata = ET.SubElement(root, 'celldata')
        celldata.text = '\n'    # newline before the collected element
        celldata.tail = '\n\n'  # empty line after the celldata element
        for line in f:
            # Empty line starts new celldata element (hack style, uggly)
            if line.isspace():
                celldata = ET.SubElement(root, 'celldata')
                celldata.text = '\n'
                celldata.tail = '\n\n'
    
            # If the line contains the wanted data, process it.
            m = rex.search(line)
            if m:
                # Fix some problems with the title as it will be used
                # as the tag name.
                title = m.group('title')
                title = title.replace('&', '')
                title = title.replace(' ', '')
    
                e = ET.SubElement(celldata, title.lower())
                e.text = m.group('value')
                e.tail = '\n'
    
    # Display for debugging            
    ET.dump(root)
    
    # Include the root element to the tree and write the tree
    # to the file.
    tree = ET.ElementTree(root)
    tree.write('cell.xml', encoding='utf-8', xml_declaration=True)
    

    它显示您的示例数据:

    <root>
    <celldata>
    <latitude>23.1100348</latitude>
    <longitude>72.5364922</longitude>
    <datetime>30:August:2014 05:04:31 PM</datetime>
    <gsmcellid>4993</gsmcellid>
    </celldata>
    
    <celldata>
    <latitude>23.1120549</latitude>
    <longitude>72.5397988</longitude>
    <datetime>30:August:2014 05:04:34 PM</datetime>
    <gsmcellid>4993</gsmcellid>
    </celldata>
    
    </root>
    

    所需近邻列表的更新:

    #!python3
    
    import re
    import xml.etree.ElementTree as ET
    
    rex = re.compile(r'''(?P<title>Longitude
                           |Latitude
                           |date&time
                           |gsm\s+cell\s+id
                           |Neighboring\s+List-\s+Lac\s+:\s+Cid\s+:\s+RSSI
                         )
                         \s*:?\s*
                         (?P<value>.*)
                         ''', re.VERBOSE)
    
    root = ET.Element('root')
    root.text = '\n'    # newline before the celldata element
    
    with open('cell.txt') as f:
        celldata = ET.SubElement(root, 'celldata')
        celldata.text = '\n'    # newline before the collected element
        celldata.tail = '\n\n'  # empty line after the celldata element
        for line in f:
            # Empty line starts new celldata element (hack style, uggly)
            if line.isspace():
                celldata = ET.SubElement(root, 'celldata')
                celldata.text = '\n'
                celldata.tail = '\n\n'
            else:
                # If the line contains the wanted data, process it.
                m = rex.search(line)
                if m:
                    # Fix some problems with the title as it will be used
                    # as the tag name.
                    title = m.group('title')
                    title = title.replace('&', '')
                    title = title.replace(' ', '')
    
                    if line.startswith('Neighboring'):
                        neighbours = ET.SubElement(celldata, 'neighbours')
                        neighbours.text = '\n'
                        neighbours.tail = '\n'
                    else:
                        e = ET.SubElement(celldata, title.lower())
                        e.text = m.group('value')
                        e.tail = '\n'
                else:
                    # This is the neighbour item. Split it by colon,
                    # and set the attributes of the item element.
                    item = ET.SubElement(neighbours, 'item')
                    item.tail = '\n'
    
                    lac, cid, rssi = (a.strip() for a in line.split(':'))
                    item.attrib['lac'] = lac
                    item.attrib['cid'] = cid
                    item.attrib['rssi'] = rssi.split()[0] # dBm removed
    
    # Include the root element to the tree and write the tree
    # to the file.
    tree = ET.ElementTree(root)
    tree.write('cell.xml', encoding='utf-8', xml_declaration=True)
    

    更新以在邻居之前接受空行 -更好的通用实现:

    #!python3
    
    import re
    import xml.etree.ElementTree as ET
    
    rex = re.compile(r'''(?P<title>Longitude
                           |Latitude
                           |date&time
                           |gsm\s+cell\s+id
                           |Neighboring\s+List-\s+Lac\s+:\s+Cid\s+:\s+RSSI
                         )
                         \s*:?\s*
                         (?P<value>.*)
                         ''', re.VERBOSE)
    
    root = ET.Element('root')
    root.text = '\n'    # newline before the celldata element
    
    with open('cell.txt') as f:
        celldata = ET.SubElement(root, 'celldata')
        celldata.text = '\n'    # newline before the collected element
        celldata.tail = '\n\n'  # empty line after the celldata element
        status = 0              # init status of the finite automaton
        for line in f:
            if status == 0:     # lines of the heading expected
                # If the line contains the wanted data, process it.
                m = rex.search(line)
                if m:
                    # Fix some problems with the title as it will be used
                    # as the tag name.
                    title = m.group('title')
                    title = title.replace('&', '')
                    title = title.replace(' ', '')
    
                    if line.startswith('Neighboring'):
                        neighbours = ET.SubElement(celldata, 'neighbours')
                        neighbours.text = '\n'
                        neighbours.tail = '\n'
                        status = 1  # empty line and then list of neighbours expected
                    else:
                        e = ET.SubElement(celldata, title.lower())
                        e.text = m.group('value')
                        e.tail = '\n'
                        # keep the same status
    
            elif status == 1:   # empty line expected
                if line.isspace():
                    status = 2  # list of neighbours must follow
                else:
                    raise RuntimeError('Empty line expected. (status == {})'.format(status))
                    status = 999 # error status
    
            elif status == 2:   # neighbour or the empty line as final separator
    
                if line.isspace():
                    celldata = ET.SubElement(root, 'celldata')
                    celldata.text = '\n'
                    celldata.tail = '\n\n'
                    status = 0  # go to the initial status
                else:
                    # This is the neighbour item. Split it by colon,
                    # and set the attributes of the item element.
                    item = ET.SubElement(neighbours, 'item')
                    item.tail = '\n'
    
                    lac, cid, rssi = (a.strip() for a in line.split(':'))
                    item.attrib['lac'] = lac
                    item.attrib['cid'] = cid
                    item.attrib['rssi'] = rssi.split()[0] # dBm removed
                    # keep the same status
    
            elif status == 999: # error status -- break the loop
                break
    
            else:
                raise LogicError('Unexpected status {}.'.format(status))
                break
    
    # Display for debugging
    ET.dump(root)
    
    # Include the root element to the tree and write the tree
    # to the file.
    tree = ET.ElementTree(root)
    tree.write('cell.xml', encoding='utf-8', xml_declaration=True)
    

    该代码实现了所谓的 有限自动机 ,其中status变量代表其当前状态。您可以使用铅笔和纸来可视化它-
    用内部状态数字绘制一个小圆圈(在图论中称为节点)。处于状态时,您仅允许某种输入(line)。识别输入后,您将箭头(图论中的定向边)绘制到另一种状态(可能是同一状态,就像循环返回到同一节点一样)。箭头标有“条件|
    行动’。

    一开始的结果可能看起来很复杂;但是,从某种意义上说,您总是可以只专注于属于特定状态的代码部分,这很容易。而且,可以轻松修改代码。但是,有限自动机的功能有限。但是它们只是解决此类问题的理想之选。



知识点
面圈网VIP题库

面圈网VIP题库全新上线,海量真题题库资源。 90大类考试,超10万份考试真题开放下载啦

去下载看看