有沒有更“Pythonic”的方式來組合 CSV 元素？

Question

基本上，我使用 python cron 從 web 讀取數據，並將其放在 CSV 列表中，格式如下：

.....
###1309482902.37
entry1,36,257.21,16.15,16.168
entry2,4,103.97,16.36,16.499
entry3,2,114.83,16.1,16.3
entry4,130.69,15.6737,16.7498
entry5,5.20,14.4,17
$$$
###1309482902.37
entry1,36,257.21,16.15,16.168
entry2,4,103.97,16.36,16.499
entry3,2,114.83,16.1,16.3
entry4,130.69,15.6737,16.7498
entry5,5.20,14.4,17
$$$

......

我的代碼基本上是進行正則表達式搜索並遍歷 ### 和 $$$ 之間的所有匹配項，然后 go 逐行遍歷每個匹配項，取每一行並用逗號分隔。 正如您所看到的，有些條目有 4 個逗號，有些條目有 5 個。那是因為我很笨，沒有意識到 web 源將逗號放在它的 4 位數字中。 IE

條目1,36,257.21,16.15,16.168

應該是真的

條目1,36257.21,16.15,16.168

我已經收集了很多數據，不想重寫，所以我想到了一個繁瑣的解決方法。 有沒有更蟒蛇的方式來做到這一點？

===

contents = ifp.read()

#Pull all entries from the market data
for entry in re.finditer("###(.*\n)*?\$\$\$",contents):

    dataSet = contents[entry.start():entry.end()]
    dataSet = dataSet.split('\n');

    timeStamp = dataSet[0][3:]
    print timeStamp

    for i in xrange(1,8):
        splits = dataSet[i].split(',')
        if(len(splits) == 5):
            remove = splits[1]
            splits[2] = splits[1] + splits[2]
            splits.remove(splits[1])
        print splits
        ## DO SOME USEFUL WORK WITH THE DATA ##

===

Answer 1

I'd use Python's csv module to read in the CSV file, fix the broken rows as I encountered them, then use csv.writer to write the CSV back out. 像這樣（假設你的原始文件，逗號在錯誤的地方，是丑陋的ugly.csv和新的，清理 output 文件會很pretty.csv ）

import csv

inputCsv = csv.reader(open("ugly.csv", "rb"))
outputCsv = csv.writer(open("pretty.csv", "wb"))

for row in inputCsv:
  if len(row) >= 5:
    row[1] = row[1] + row[2] #note that csv entries are strings, so this is string concatenation, not addition
    del row[2]
  outputCsv.writerow(row)

干凈簡單，而且，由於您使用的是正確的 CSV 解析器和編寫器，因此您不必擔心引入任何新的奇怪的極端情況（如果您在第一個腳本中使用了它，解析 web 結果，逗號您的輸入數據將被轉義）。

Answer 2

通常csv模塊用於處理所有格式的 CSV 文件。

然而，在這里你有逗號這種丑陋的情況，所以丑陋的黑客是合適的。 我沒有看到一個干凈的解決方案，所以我認為 go 可以使用任何工作。

順便說一句，這條線似乎是多余的：

remove = splits[1]

Answer 3

其他人建議您使用csv來解析文件，這是個好建議。 但它並沒有直接解決另一個問題——即，您正在處理一個由數據部分組成的文件。 通過將文件轉換為單個字符串，然后使用正則表達式來解析該大字符串，您將丟掉文件的一個關鍵杠桿點。 另一種策略是編寫一個可以解析文件的方法，一次生成一個部分。

def read_next_section(f):
    for line in f:
        line = line.strip()
        if line.startswith('#'):
            # Start of a new section.
            ts = line[3:]
            data = []
        elif line.startswith('$'):
            # End of a section.
            yield ts, data
        else:
            # Probably a good idea to use csv, as others recommend.
            # Also, write a method to deal with extra-comma problem.
            fields = line.split(',')
            data.append(fields)

with open(sys.argv[1]) as input_file:
    for time_stamp, section in read_next_section(input_file):
        # Do stuff.

Answer 4

編寫此代碼塊的更 Pythonic 方式

for i in xrange(1,8):
    splits = dataSet[i].split(',')
    if(len(splits) == 5):
        remove = splits[1]
        splits[2] = splits[1] + splits[2]
        splits.remove(splits[1])
    print splits

將會

for row in dataSet:
    name, data = row.split(',', 1)
    print [name] + data.rsplit(',', 2)

有沒有更“Pythonic”的方式來組合 CSV 元素？

問題描述

4 個解決方案

解決方案1
2 2011-07-01 02:10:44

解決方案2
0 2011-07-01 01:54:11

解決方案3
0 2011-07-01 02:24:11

解決方案4
0 2011-07-01 02:39:23

有沒有更“Pythonic”的方式來組合 CSV 元素？

問題描述

4 個解決方案

解決方案1 2 2011-07-01 02:10:44

解決方案2 0 2011-07-01 01:54:11

解決方案3 0 2011-07-01 02:24:11

解決方案4 0 2011-07-01 02:39:23

解決方案1
2 2011-07-01 02:10:44

解決方案2
0 2011-07-01 01:54:11

解決方案3
0 2011-07-01 02:24:11

解決方案4
0 2011-07-01 02:39:23