在python中將空格分隔文件轉換為逗號分隔值文件

Question

我對 Python 很陌生。 我知道這已經被問到了，我很抱歉，但這種新情況的不同之處在於字符串之間的空格不相等。 我有一個名為coord的文件，其中包含以下以空格分隔的字符串：

   1  C       6.00    0.000000000    1.342650315    0.000000000
   2  C       6.00    0.000000000   -1.342650315    0.000000000
   3  C       6.00    2.325538562    2.685300630    0.000000000
   4  C       6.00    2.325538562   -2.685300630    0.000000000
   5  C       6.00    4.651077125    1.342650315    0.000000000
   6  C       6.00    4.651077125   -1.342650315    0.000000000
   7  C       6.00   -2.325538562    2.685300630    0.000000000
   8  C       6.00   -2.325538562   -2.685300630    0.000000000
   9  C       6.00   -4.651077125    1.342650315    0.000000000
  10  C       6.00   -4.651077125   -1.342650315    0.000000000
  11  H       1.00    2.325538562    4.733763602    0.000000000
  12  H       1.00    2.325538562   -4.733763602    0.000000000
  13  H       1.00   -2.325538562    4.733763602    0.000000000
  14  H       1.00   -2.325538562   -4.733763602    0.000000000
  15  H       1.00    6.425098097    2.366881801    0.000000000
  16  H       1.00    6.425098097   -2.366881801    0.000000000
  17  H       1.00   -6.425098097    2.366881801    0.000000000
  18  H       1.00   -6.425098097   -2.366881801    0.000000000

請注意第一列中每個字符串開頭之前的空格。 所以我嘗試了以下將其轉換為 csv 的順序：

with open('coord') as infile, open('coordv', 'w') as outfile:
    outfile.write(infile.read().replace("  ", ", "))

# Unneeded columns are deleted from the csv

input = open('coordv', 'rb')
output = open('coordcsvout', 'wb')
writer = csv.writer(output)
for row in csv.reader(input):
    if row:
        writer.writerow(row)
input.close()
output.close()

with open("coordcsvout","rb") as source:
    rdr= csv.reader( source )
    with open("coordbarray","wb") as result:
        wtr= csv.writer(result)
        for r in rdr:
            wtr.writerow( (r[5], r[6], r[7]) )

當我運行腳本時，我在腳本的第一部分得到了coordv的以下內容，這當然是非常錯誤的：

,  1, C, , ,  6.00, , 0.000000000, , 1.342650315, , 0.000000000
,  2, C, , ,  6.00, , 0.000000000,  -1.342650315, , 0.000000000
,  3, C, , ,  6.00, , 2.325538562, , 2.685300630, , 0.000000000
,  4, C, , ,  6.00, , 2.325538562,  -2.685300630, , 0.000000000
,  5, C, , ,  6.00, , 4.651077125, , 1.342650315, , 0.000000000
,  6, C, , ,  6.00, , 4.651077125,  -1.342650315, , 0.000000000
,  7, C, , ,  6.00,  -2.325538562, , 2.685300630, , 0.000000000
,  8, C, , ,  6.00,  -2.325538562,  -2.685300630, , 0.000000000
,  9, C, , ,  6.00,  -4.651077125, , 1.342650315, , 0.000000000
, 10, C, , ,  6.00,  -4.651077125,  -1.342650315, , 0.000000000
, 11, H, , ,  1.00, , 2.325538562, , 4.733763602, , 0.000000000
, 12, H, , ,  1.00, , 2.325538562,  -4.733763602, , 0.000000000
, 13, H, , ,  1.00,  -2.325538562, , 4.733763602, , 0.000000000
, 14, H, , ,  1.00,  -2.325538562,  -4.733763602, , 0.000000000
, 15, H, , ,  1.00, , 6.425098097, , 2.366881801, , 0.000000000
, 16, H, , ,  1.00, , 6.425098097,  -2.366881801, , 0.000000000
, 17, H, , ,  1.00,  -6.425098097, , 2.366881801, , 0.000000000
, 18, H, , ,  1.00,  -6.425098097,  -2.366881801, , 0.000000000

我在 .replace 中嘗試了不同的可能性，但沒有成功，到目前為止，我還沒有找到任何關於如何做到這一點的信息來源。 從這個坐標文件中獲取逗號分隔值的最佳方法是什么？ 我感興趣的是使用 python 中的 csv 模塊來選擇列 4:6，最后使用 numpy 導入它們，如下所示：

from numpy import genfromtxt
cocmatrix = genfromtxt('input', delimiter=',')

如果有人能幫助我解決這個問題，我會很高興。

Answer 1

您可以使用 csv：

import csv

with open(ur_infile) as fin, open(ur_outfile, 'w') as fout:
    o=csv.writer(fout)
    for line in fin:
        o.writerow(line.split())

Answer 2

您可以使用python pandas ，我已將您的數據寫入data.csv ：

import pandas as pd
>>> df = pd.read_csv('data.csv',sep='\s+',header=None)
>>> df
     0  1  2         3         4  5
0    1  C  6  0.000000  1.342650  0
1    2  C  6  0.000000 -1.342650  0
2    3  C  6  2.325539  2.685301  0
3    4  C  6  2.325539 -2.685301  0
4    5  C  6  4.651077  1.342650  0
5    6  C  6  4.651077 -1.342650  0
...

這樣做的df.values可以使用df.values訪問底層的 numpy 數組：

>>> type(df.values)
<type 'numpy.ndarray'>

要使用逗號分隔符保存數據框：

>>> df.to_csv('data_out.csv',header=None)

Pandas 是一個很好的管理大量數據的庫，作為獎勵，它與 numpy 配合得很好。 這也很有可能比使用csv模塊快得多。

Answer 3

用這個替換你的第一個位。 它不是超級漂亮，但它會給你一個 csv 格式。

with open('coord') as infile, open('coordv', 'w') as outfile:
    for line in infile:
        outfile.write(" ".join(line.split()).replace(' ', ','))
        outfile.write(",") # trailing comma shouldn't matter

如果您希望 outfile 將所有內容都放在不同的行上，您可以在 for 循環的末尾添加outfile.write("\\n") ，但我認為后面的代碼不會像那樣使用它。

Answer 4

>>> a = 'cah  1  C       6.00    0.000000000    1.342650315    0.000000000'
=>  a = 'cah  1  C       6.00    0.000000000    1.342650315    0.000000000'

>>> a.split()
=>  ['cah', '1', 'C', '6.00', '0.000000000', '1.342650315', '0.000000000']

>>> ','.join(a.split())
=>  'cah,1,C,6.00,0.000000000,1.342650315,0.000000000'

>>> ['"' + x + '"' for x in a.split()]
=>  ['"cah"', '"1"', '"C"', '"6.00"', '"0.000000000"', '"1.342650315"', '"0.000000000"']

>>> ','.join(['"' + x + '"' for x in a.split()]
=>  '"cah","1","C","6.00","0.000000000","1.342650315","0.000000000"'

Answer 5

用於將“空格”轉換為“，”

只填寫你想要的文件名

with open('filename') as infile, open('output', 'w') as outfile:
    outfile.write(infile.read().replace(" ", ","))

用於將“,”轉換為“空格”

with open('filename') as infile, open('output', 'w') as outfile: outfile.write(infile.read().replace(",", " "))

Answer 6

為什么不逐行讀取文件？ 將一行拆分為一個列表，然后使用 ',' 重新加入一個列表。

Answer 7

csv 模塊很好，或者這里有一種不用的方法：

#!/usr/local/cpython-3.3/bin/python

with open('input-file.csv', 'r') as infile, open('output.csv', 'w') as outfile:
    for line in infile:
        fields = line.split()
        outfile.write('{}\n'.format(','.join(fields)))

Answer 8

用於在一個 CSV 中合並多個文本文件

import csv
import os
for x in range(0,n):            #n = max number of files 
    with open('input{}.txt'.format(x)) as fin, open('output.csv', 'a') as fout:
       csv_output=csv.writer(fout)
       for line in fin:
            csv_output.writerow(line.split())

在python中將空格分隔文件轉換為逗號分隔值文件

問題描述

8 個解決方案

解決方案1
14 2013-11-03 23:35:03

解決方案2
8 2013-11-03 23:41:27

解決方案3
6 已采納 2013-11-03 23:30:02

解決方案4
1 2013-11-04 00:01:22

解決方案5
1 2019-10-14 18:27:46

用於將“空格”轉換為“，”

用於將“,”轉換為“空格”

解決方案6
0 2013-11-03 23:51:04

解決方案7
0 2013-11-04 00:41:55

解決方案8
0 2019-11-22 10:18:49

用於在一個 CSV 中合並多個文本文件

在python中將空格分隔文件轉換為逗號分隔值文件

問題描述

8 個解決方案

解決方案1 14 2013-11-03 23:35:03

解決方案2 8 2013-11-03 23:41:27

解決方案3 6 已采納 2013-11-03 23:30:02

解決方案4 1 2013-11-04 00:01:22

解決方案5 1 2019-10-14 18:27:46

用於將“空格”轉換為“，”

用於將“,”轉換為“空格”

解決方案6 0 2013-11-03 23:51:04

解決方案7 0 2013-11-04 00:41:55

解決方案8 0 2019-11-22 10:18:49

用於在一個 CSV 中合並多個文本文件

解決方案1
14 2013-11-03 23:35:03

解決方案2
8 2013-11-03 23:41:27

解決方案3
6 已采納 2013-11-03 23:30:02

解決方案4
1 2013-11-04 00:01:22

解決方案5
1 2019-10-14 18:27:46

解決方案6
0 2013-11-03 23:51:04

解決方案7
0 2013-11-04 00:41:55

解決方案8
0 2019-11-22 10:18:49