如何在标头的行中使用带有更多分隔符的Pandas读取csv文件？

Question

I want to read a csv file using read_csv function from Pandas, the file has more delimiters in the rows that in the header. 我想使用Pandas的read_csv函数读取一个csv文件，该文件在行中的分隔符比标题中的更多。 Pandas thinks the first columns are multi-index. 熊猫认为第一列是多索引。 The 'NAME' column can have an arbitrary number of delimiters and the affected column could be any one (we do not know which one is affected), even more than one. “ NAME”列可以有任意数量的定界符，并且受影响的列可以是任意一个（我们不知道哪个受影响），甚至可以超过一个。

I have tried to tune the key-word arguments of read_csv without success. 我试图调整read_csv的关键字参数而没有成功。 I am using Python 3.7.0 and Pandas 0.25.0. 我正在使用Python 3.7.0和Pandas 0.25.0。 However, Excel can read the file correctly. 但是，Excel可以正确读取文件。

import pandas

with open('test.csv', mode='w') as csv_file:
    csv_file.write('A,NAME,B\n')
    csv_file.write('a, Peter, Parker, b\n')

df = pandas.read_csv('test.csv', header=0, delimiter=',')
print(df)

Expected output: 预期产量：

   A            NAME   B
0  a   Peter, Parker   b

Actual output: 实际输出：

    A     NAME   B
a   Peter   Parker   b

Other example: 其他例子：

import pandas

with open('test.csv', mode='w') as csv_file:
    csv_file.write('A,NAME,B,PLACE\n')
    csv_file.write('a, Peter, Parker, b, Queens, New York City\n')

df = pandas.read_csv('test.csv', header=0, delimiter=',')
print(df)

Expected output: 预期产量：

   A            NAME   B                 PLACE
0  a   Peter, Parker   b Queens, New York City

Actual output: 实际输出：

                A NAME        B           PLACE
a  Peter   Parker    b   Queens   New York City

Answer 1

Isn't something like 是不是像

df = pandas.read_csv('test.csv', header=0, delimiter=',')
df = df.reset_index()
df["NAME"] = df["A"] + ", " + df["NAME"]
df["A"] = df["Unnamed: 0"]
df = df.drop("Unnamed: 0", axis=1)

possible ? 可能吗？ It is not completely answering this question but could do the trick for your df. 它不能完全回答这个问题，但是可以解决您的df问题。

EDIT : Another possibility, if the file is also available in .xls/.xlsx format, pd.read_excel("name.xls") should solve your problem 编辑：另一种可能性，如果该文件也可用.xls / .xlsx格式，则pd.read_excel("name.xls")应该可以解决您的问题

Answer 2

A workaround: 解决方法：

with open('test.csv', mode='w') as csv_file:  
    csv_file.write('A,NAME,B\n')  
    csv_file.write('a, Peter, Parker, b\n')  
    csv_file.write('aa, John, Lee, Mary, bb\n')            

df=pd.DataFrame(columns=["A","NAMES","B"])                                                                           

with open("test.csv") as ff:  
   for line in ff:  
       A,N= line.split(",",maxsplit=1)  
       N,B= N.rsplit(",",maxsplit=1)  
       df.loc[len(df.index)]= [A.strip(),N.strip(),B.strip()] 

df.drop(0,axis="index")

    A            NAMES   B
1   a    Peter, Parker   b
2  aa  John, Lee, Mary  bb

Answer 3

# Read first line as a list using ',' as delimiter
with open('test.csv', 'r') as f:
  header = f.readline().replace('\n', '').split(',')
# Read file skipping first line (header) using two character delimiter ', '
df = pandas.read_csv('test.csv', skiprows = 1, header = None, delimiter=', ', engine = 'python')
header = header + ["missing-column"] # In your example file header has only 3 columns but data has 4
# Assign header list as dataframe columns names
df.columns = header
print(df)

如何在标头的行中使用带有更多分隔符的Pandas读取csv文件？

问题描述

3 个解决方案

解决方案1
0 2019-09-09 14:53:23

解决方案2
0 2019-09-09 16:03:10

解决方案3
0 2019-09-09 16:07:41

如何在标头的行中使用带有更多分隔符的Pandas读取csv文件？

问题描述

3 个解决方案

解决方案1 0 2019-09-09 14:53:23

解决方案2 0 2019-09-09 16:03:10

解决方案3 0 2019-09-09 16:07:41

解决方案1
0 2019-09-09 14:53:23

解决方案2
0 2019-09-09 16:03:10

解决方案3
0 2019-09-09 16:07:41