[英]Convert a space delimited file to comma separated values file in python
我對 Python 很陌生。 我知道這已經被問到了,我很抱歉,但這種新情況的不同之處在於字符串之間的空格不相等。 我有一個名為coord的文件,其中包含以下以空格分隔的字符串:
1 C 6.00 0.000000000 1.342650315 0.000000000
2 C 6.00 0.000000000 -1.342650315 0.000000000
3 C 6.00 2.325538562 2.685300630 0.000000000
4 C 6.00 2.325538562 -2.685300630 0.000000000
5 C 6.00 4.651077125 1.342650315 0.000000000
6 C 6.00 4.651077125 -1.342650315 0.000000000
7 C 6.00 -2.325538562 2.685300630 0.000000000
8 C 6.00 -2.325538562 -2.685300630 0.000000000
9 C 6.00 -4.651077125 1.342650315 0.000000000
10 C 6.00 -4.651077125 -1.342650315 0.000000000
11 H 1.00 2.325538562 4.733763602 0.000000000
12 H 1.00 2.325538562 -4.733763602 0.000000000
13 H 1.00 -2.325538562 4.733763602 0.000000000
14 H 1.00 -2.325538562 -4.733763602 0.000000000
15 H 1.00 6.425098097 2.366881801 0.000000000
16 H 1.00 6.425098097 -2.366881801 0.000000000
17 H 1.00 -6.425098097 2.366881801 0.000000000
18 H 1.00 -6.425098097 -2.366881801 0.000000000
請注意第一列中每個字符串開頭之前的空格。 所以我嘗試了以下將其轉換為 csv 的順序:
with open('coord') as infile, open('coordv', 'w') as outfile:
outfile.write(infile.read().replace(" ", ", "))
# Unneeded columns are deleted from the csv
input = open('coordv', 'rb')
output = open('coordcsvout', 'wb')
writer = csv.writer(output)
for row in csv.reader(input):
if row:
writer.writerow(row)
input.close()
output.close()
with open("coordcsvout","rb") as source:
rdr= csv.reader( source )
with open("coordbarray","wb") as result:
wtr= csv.writer(result)
for r in rdr:
wtr.writerow( (r[5], r[6], r[7]) )
當我運行腳本時,我在腳本的第一部分得到了coordv的以下內容,這當然是非常錯誤的:
, 1, C, , , 6.00, , 0.000000000, , 1.342650315, , 0.000000000
, 2, C, , , 6.00, , 0.000000000, -1.342650315, , 0.000000000
, 3, C, , , 6.00, , 2.325538562, , 2.685300630, , 0.000000000
, 4, C, , , 6.00, , 2.325538562, -2.685300630, , 0.000000000
, 5, C, , , 6.00, , 4.651077125, , 1.342650315, , 0.000000000
, 6, C, , , 6.00, , 4.651077125, -1.342650315, , 0.000000000
, 7, C, , , 6.00, -2.325538562, , 2.685300630, , 0.000000000
, 8, C, , , 6.00, -2.325538562, -2.685300630, , 0.000000000
, 9, C, , , 6.00, -4.651077125, , 1.342650315, , 0.000000000
, 10, C, , , 6.00, -4.651077125, -1.342650315, , 0.000000000
, 11, H, , , 1.00, , 2.325538562, , 4.733763602, , 0.000000000
, 12, H, , , 1.00, , 2.325538562, -4.733763602, , 0.000000000
, 13, H, , , 1.00, -2.325538562, , 4.733763602, , 0.000000000
, 14, H, , , 1.00, -2.325538562, -4.733763602, , 0.000000000
, 15, H, , , 1.00, , 6.425098097, , 2.366881801, , 0.000000000
, 16, H, , , 1.00, , 6.425098097, -2.366881801, , 0.000000000
, 17, H, , , 1.00, -6.425098097, , 2.366881801, , 0.000000000
, 18, H, , , 1.00, -6.425098097, -2.366881801, , 0.000000000
我在 .replace 中嘗試了不同的可能性,但沒有成功,到目前為止,我還沒有找到任何關於如何做到這一點的信息來源。 從這個坐標文件中獲取逗號分隔值的最佳方法是什么? 我感興趣的是使用 python 中的 csv 模塊來選擇列 4:6,最后使用 numpy 導入它們,如下所示:
from numpy import genfromtxt
cocmatrix = genfromtxt('input', delimiter=',')
如果有人能幫助我解決這個問題,我會很高興。
您可以使用 csv:
import csv
with open(ur_infile) as fin, open(ur_outfile, 'w') as fout:
o=csv.writer(fout)
for line in fin:
o.writerow(line.split())
您可以使用python pandas ,我已將您的數據寫入data.csv
:
import pandas as pd
>>> df = pd.read_csv('data.csv',sep='\s+',header=None)
>>> df
0 1 2 3 4 5
0 1 C 6 0.000000 1.342650 0
1 2 C 6 0.000000 -1.342650 0
2 3 C 6 2.325539 2.685301 0
3 4 C 6 2.325539 -2.685301 0
4 5 C 6 4.651077 1.342650 0
5 6 C 6 4.651077 -1.342650 0
...
這樣做的df.values
可以使用df.values
訪問底層的 numpy 數組:
>>> type(df.values)
<type 'numpy.ndarray'>
要使用逗號分隔符保存數據框:
>>> df.to_csv('data_out.csv',header=None)
Pandas 是一個很好的管理大量數據的庫,作為獎勵,它與 numpy 配合得很好。 這也很有可能比使用csv
模塊快得多。
用這個替換你的第一個位。 它不是超級漂亮,但它會給你一個 csv 格式。
with open('coord') as infile, open('coordv', 'w') as outfile:
for line in infile:
outfile.write(" ".join(line.split()).replace(' ', ','))
outfile.write(",") # trailing comma shouldn't matter
如果您希望 outfile 將所有內容都放在不同的行上,您可以在 for 循環的末尾添加outfile.write("\\n")
,但我認為后面的代碼不會像那樣使用它。
>>> a = 'cah 1 C 6.00 0.000000000 1.342650315 0.000000000'
=> a = 'cah 1 C 6.00 0.000000000 1.342650315 0.000000000'
>>> a.split()
=> ['cah', '1', 'C', '6.00', '0.000000000', '1.342650315', '0.000000000']
>>> ','.join(a.split())
=> 'cah,1,C,6.00,0.000000000,1.342650315,0.000000000'
>>> ['"' + x + '"' for x in a.split()]
=> ['"cah"', '"1"', '"C"', '"6.00"', '"0.000000000"', '"1.342650315"', '"0.000000000"']
>>> ','.join(['"' + x + '"' for x in a.split()]
=> '"cah","1","C","6.00","0.000000000","1.342650315","0.000000000"'
只填寫你想要的文件名
with open('filename') as infile, open('output', 'w') as outfile:
outfile.write(infile.read().replace(" ", ","))
with open('filename') as infile, open('output', 'w') as outfile: outfile.write(infile.read().replace(",", " "))
為什么不逐行讀取文件? 將一行拆分為一個列表,然后使用 ',' 重新加入一個列表。
csv 模塊很好,或者這里有一種不用的方法:
#!/usr/local/cpython-3.3/bin/python
with open('input-file.csv', 'r') as infile, open('output.csv', 'w') as outfile:
for line in infile:
fields = line.split()
outfile.write('{}\n'.format(','.join(fields)))
import csv
import os
for x in range(0,n): #n = max number of files
with open('input{}.txt'.format(x)) as fin, open('output.csv', 'a') as fout:
csv_output=csv.writer(fout)
for line in fin:
csv_output.writerow(line.split())
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.