[英]How can I build an sqlite table from this xml/txt file using python?
我有一個像這樣的xml / txt文件:
<text id="32a45" language="ENG" date="2017-01-01" time="11:00" timezone="Eastern">
<s id="1">
foo
bar
</s>
<d>
11235
</d>
<text id="32a47" language="ENG" date="2017-01-05" time="1:00" timezone="Central">
<s id="2">
foo
bar
</s>
<d>
11235
</d>
<text id="32a48" language="ENG" date="2017-01-07" time="3:00" timezone="Pacific">
<s id="3">
foo
bar
</s>
<d>
11235
</d>
我想使用python建立如下的sqlite表:
id language date timezone s d
32a45 ENG 2017-01-01 Eastern foo bar 11235
32a47 ENG 2017-01-05 Central baz qux 11235
32a48 ENG 2017-01-07 Pacific foo bar 11235
知道我該怎么做嗎? 我無法使用xmltree模塊,因為原始文件中的xml標簽被弄亂了。 我非常感謝您的幫助。 謝謝。
編輯:我可以輕松地將每個文本作為列表內的列表。 像這樣:
['<text id="32a45" language="ENG" date="2017-01-01" time="11:00" timezone="Eastern">', '<text id="32a47" language="ENG" date="2017-01-05" time="1:00" timezone="Central">', '<text id="32a48" language="ENG" date="2017-01-07" time="3:00" timezone="Pacific">']
但是我不知道如何分別從每個列表中獲取ID,語言等。
從這里重定向:
import xml.etree.ElementTree as ET
import pandas as pd
strings = ['<text id="32a45" language="ENG" date="2017-01-01" time="11:00" timezone="Eastern">',
'<text id="32a47" language="ENG" date="2017-01-05" time="1:00" timezone="Central">',
'<text id="32a48" language="ENG" date="2017-01-07" time="3:00" timezone="Pacific">']
cols = ["id","language","date","time","timezone"]
data = [[ET.fromstring(string+"</text>").get(col) for col in cols] for string in strings]
df = pd.DataFrame(data,columns=cols)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.