在Python中從文件到文件提取數據

Question

我想將數據從第一個文件提取到第二個文件並將它們放入特定標簽中。 第一個文件看起來像：

"city1" : [[1.1,1.2],[2.1,2.2],[3.1,3.2]],
"city2" : [[5.0,0.2],[4.1,3.2],[7.1,8.2]],
...

所以類型就像字典，其中值是列表

不幸的是，在打開文件時出現錯誤：要解壓縮的值太多

我試圖打開像：

lines = {}
with open("shape.txt", "r") as f:
for line in f:
    (key,val) = line.split()
    d[key] = val

之后，我想將這些城市和坐標提取到具有如下結構的第二個文件中：

<state name = 'city1'>
 <point lat='first value from first list', lng='second value from first list/>
 <point lat='first value from second list', lng='second value from second list/>
</state>
<state name = 'city2'>
 the same action like above

我在想是否還有其他解決方案？

Answer 1

您可以使用它輕松加載和保存列表和字典。

import json

data = {
  'city1': [[1.1,1.2],[2.1,2.2],[3.1,3.2]],
  'city2': [[5.0,0.2],[4.1,3.2],[7.1,8.2]],
}

with open('file.txt', 'w') as f:
   f.write(json.dumps(data))

with open('file.txt') as f:
   data = json.loads(f.read())

但只有當文件具有有效的 json 時它才會起作用。 您的文件幾乎是有效的 JSON，只是它沒有花括號

所以我認為可以這樣做：

lines = ['{']
with open('file') as f:
for line in f:
    lines.append(line)
lines[-1].strip(',')  # Strip last comma as it's not valid for JSON dict
lines.append('}')
data = json.loads('\n'.join(lines))

然后就這樣做：

result = ''
for city, points in data.items():
    result += f"<state name='{city}'>\n"
    for point in points:
        result += f"<point lat='{point[0]}', lng='{point[1]}'/>\n"
    result += '</state>\n'

 with open('out.txt', 'w') as f:
     f.write(result)

Answer 2

如果您的文本文件只包含您所陳述的這種結構，您應該能夠使用ast.literal_eval來解析數據：

txt = '''

"city1" : [[1.1,1.2],[2.1,2.2],[3.1,3.2]],
"city2" : [[5.0,0.2],[4.1,3.2],[7.1,8.2]],

'''

template = '''<state name = '{city}'>
  <point lat='{vals[0][0]}', lng='{vals[0][1]}' />
  <point lat='{vals[1][0]}', lng='{vals[1][1]}' />
</state>'''

from ast import literal_eval

data = literal_eval('{' + txt + '}')

print(data)

for k, v in data.items():
    print(template.format(city=k, vals=v))

印刷：

<state name = 'city1'>
  <point lat='1.1', lng='1.2' />
  <point lat='2.1', lng='2.2' />
</state>
<state name = 'city2'>
  <point lat='5.0', lng='0.2' />
  <point lat='4.1', lng='3.2' />
</state>

使用文件 I/O：

template = '''<state name = '{city}'>
  <point lat='{vals[0][0]}', lng='{vals[0][1]}' />
  <point lat='{vals[1][0]}', lng='{vals[1][1]}' />
</state>'''

from ast import literal_eval

with open('sample.txt', 'r') as f_in, open('sample.out', 'w') as f_out:
    data = literal_eval('{' + f_in.read() + '}')

    for k, v in data.items():
        print(template.format(city=k, vals=v), file=f_out)

編輯：此示例將打印文件的所有點：

from ast import literal_eval

with open('sample.txt', 'r') as f_in, open('sample.out', 'w') as f_out:
    data = literal_eval('{' + f_in.read() + '}')

    for k, v in data.items():
        print("<state name = '{city}'>".format(city=k), file=f_out)
        for point in v:
            print("\t<point lat='{point[0]}', lng='{point[1]}' />".format(point=point), file=f_out)
        print('</state>', file=f_out)

文件sample.out將如下所示：

<state name = 'city1'>
    <point lat='1.1', lng='1.2' />
    <point lat='2.1', lng='2.2' />
    <point lat='3.1', lng='3.2' />
</state>
<state name = 'city2'>
    <point lat='5.0', lng='0.2' />
    <point lat='4.1', lng='3.2' />
    <point lat='7.1', lng='8.2' />
</state>

Answer 3

這是直接寫入第二個文件的另一種方式。 這樣你就不需要先存儲它dict 。 如果您處理一些大文件，則性能會更高：

with open("shape.txt", "r") as f1:
    with open("shape_output.txt", "w") as f2:
        for line in f1:
            (key, val) = line.split(":")
            coord = json.loads(val.strip().rstrip(','))
            city = key.replace('"', '').strip()

            new_line = f"<state name = '{city}'>"
            for c in coord:
                new_line += f"<point lat='{c[0]}', lng='{c[1]}'/>"
            new_line += "</state>"

            f2.write(new_line)
            f2.write("\n")

從第一個文件讀取行時，我添加了一些清理。 然后使用json.loads將數組從字符串格式轉換為數組類型。

其余的只是格式化和寫入第二個文件。

在Python中從文件到文件提取數據

問題描述

3 個解決方案

解決方案1
2 2019-12-07 15:07:18

解決方案2
2 已采納 2019-12-07 15:11:11

解決方案3
2 2019-12-07 15:29:37

在Python中從文件到文件提取數據

問題描述

3 個解決方案

解決方案1 2 2019-12-07 15:07:18

解決方案2 2 已采納 2019-12-07 15:11:11

解決方案3 2 2019-12-07 15:29:37

解決方案1
2 2019-12-07 15:07:18

解決方案2
2 已采納 2019-12-07 15:11:11

解決方案3
2 2019-12-07 15:29:37