在 Python Pandas read_csv 中使用多字符分隔符

Question

看來熊貓read_csv函數只允許單字符分隔符/分隔符。 有沒有辦法允許使用字符串，例如“*|*”或“%%”？

Answer 1

import panda as pd
pd.read_csv(csv_file, sep="\*\|\*")

Answer 2

解決方案是使用read_table而不是read_csv：

1*|*2*|*3*|*4*|*5
12*|*12*|*13*|*14*|*15
21*|*22*|*23*|*24*|*25

所以，我們可以用以下內容來閱讀：

pd.read_table('file.csv', header=None, sep='\*\|\*')

Answer 3

正如Padraic Cunningham在上面的評論中寫道，目前還不清楚你為什么要這樣做。 CSV規范的Wiki條目說明了分隔符：

...由分隔符分隔（通常是單個保留字符，如逗號，分號或制表符;有時分隔符可能包含可選空格），

不出所料， csv模塊和pandas都不支持你所要求的。

但是，如果你真的想這樣做，那么你幾乎要使用Python的字符串操作。 以下示例顯示如何將數據框轉換為帶有$$分隔行的“csv”和%%分隔列。

'$$'.join('%%'.join(str(r) for r in rec) for rec in df.to_records())

當然，在將其寫入文件之前，您不必將其轉換為這樣的字符串。

Answer 4

不是pythonic方式，但絕對是編程方式，你可以使用這樣的東西：

import re

def row_reader(row,fd):
    arr=[]
    in_arr = str.split(fd)
    i = 0
    while i < len(in_arr):
        if re.match('^".*',in_arr[i]) and not re.match('.*"$',in_arr[i]):
            flag = True
            buf=''
            while flag and i < len(in_arr):
                buf += in_arr[i]
                if re.match('.*"$',in_arr[i]):
                    flag = False
                i+=1
                buf += fd if flag else ''
            arr.append(buf)
        else:
            arr.append(in_arr[i])
            i+=1
    return arr

with open(file_name,'r') as infile:
    for row in infile:
        for field in  row_reader(row,'%%'):
            print(field)

Answer 5

在 pandas 1.1.4 中，當我嘗試使用多字符分隔符時，我收到消息：

ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.

因此，為了能夠使用多個字符分隔符，現代解決方案似乎是在read_csv參數中添加engine='python' （在我的情況下，我將它與sep='[ ]?; ）

在 Python Pandas read_csv 中使用多字符分隔符

問題描述

5 個解決方案

解決方案1
7 2017-08-08 15:20:54

解決方案2
4 已采納 2017-05-10 18:36:18

解決方案3
3 2015-07-02 21:27:46

解決方案4
0 2020-02-05 11:20:17

解決方案5
0 2021-02-21 14:31:10

在 Python Pandas read_csv 中使用多字符分隔符

問題描述

5 個解決方案

解決方案1 7 2017-08-08 15:20:54

解決方案2 4 已采納 2017-05-10 18:36:18

解決方案3 3 2015-07-02 21:27:46

解決方案4 0 2020-02-05 11:20:17

解決方案5 0 2021-02-21 14:31:10

解決方案1
7 2017-08-08 15:20:54

解決方案2
4 已采納 2017-05-10 18:36:18

解決方案3
3 2015-07-02 21:27:46

解決方案4
0 2020-02-05 11:20:17

解決方案5
0 2021-02-21 14:31:10