简体   繁体   English

如何在标头的行中使用带有更多分隔符的Pandas读取csv文件?

[英]How to read a csv file with Pandas with more delimiters in the rows that in the header?

I want to read a csv file using read_csv function from Pandas, the file has more delimiters in the rows that in the header. 我想使用Pandas的read_csv函数读取一个csv文件,该文件在行中的分隔符比标题中的更多。 Pandas thinks the first columns are multi-index. 熊猫认为第一列是多索引。 The 'NAME' column can have an arbitrary number of delimiters and the affected column could be any one (we do not know which one is affected), even more than one. “ NAME”列可以有任意数量的定界符,并且受影响的列可以是任意一个(我们不知道哪个受影响),甚至可以超过一个。

I have tried to tune the key-word arguments of read_csv without success. 我试图调整read_csv的关键字参数而没有成功。 I am using Python 3.7.0 and Pandas 0.25.0. 我正在使用Python 3.7.0和Pandas 0.25.0。 However, Excel can read the file correctly. 但是,Excel可以正确读取文件。

import pandas

with open('test.csv', mode='w') as csv_file:
    csv_file.write('A,NAME,B\n')
    csv_file.write('a, Peter, Parker, b\n')

df = pandas.read_csv('test.csv', header=0, delimiter=',')
print(df)

Expected output: 预期产量:

   A            NAME   B
0  a   Peter, Parker   b

Actual output: 实际输出:

    A     NAME   B
a   Peter   Parker   b

Other example: 其他例子:

import pandas

with open('test.csv', mode='w') as csv_file:
    csv_file.write('A,NAME,B,PLACE\n')
    csv_file.write('a, Peter, Parker, b, Queens, New York City\n')

df = pandas.read_csv('test.csv', header=0, delimiter=',')
print(df)

Expected output: 预期产量:

   A            NAME   B                 PLACE
0  a   Peter, Parker   b Queens, New York City

Actual output: 实际输出:

                A NAME        B           PLACE
a  Peter   Parker    b   Queens   New York City

Isn't something like 是不是像

df = pandas.read_csv('test.csv', header=0, delimiter=',')
df = df.reset_index()
df["NAME"] = df["A"] + ", " + df["NAME"]
df["A"] = df["Unnamed: 0"]
df = df.drop("Unnamed: 0", axis=1)

possible ? 可能吗? It is not completely answering this question but could do the trick for your df. 它不能完全回答这个问题,但是可以解决您的df问题。

EDIT : Another possibility, if the file is also available in .xls/.xlsx format, pd.read_excel("name.xls") should solve your problem 编辑:另一种可能性,如果该文件也可用.xls / .xlsx格式,则pd.read_excel("name.xls")应该可以解决您的问题

A workaround: 解决方法:

with open('test.csv', mode='w') as csv_file:  
    csv_file.write('A,NAME,B\n')  
    csv_file.write('a, Peter, Parker, b\n')  
    csv_file.write('aa, John, Lee, Mary, bb\n')            

df=pd.DataFrame(columns=["A","NAMES","B"])                                                                           

with open("test.csv") as ff:  
   for line in ff:  
       A,N= line.split(",",maxsplit=1)  
       N,B= N.rsplit(",",maxsplit=1)  
       df.loc[len(df.index)]= [A.strip(),N.strip(),B.strip()] 

df.drop(0,axis="index")

    A            NAMES   B
1   a    Peter, Parker   b
2  aa  John, Lee, Mary  bb
# Read first line as a list using ',' as delimiter
with open('test.csv', 'r') as f:
  header = f.readline().replace('\n', '').split(',')
# Read file skipping first line (header) using two character delimiter ', '
df = pandas.read_csv('test.csv', skiprows = 1, header = None, delimiter=', ', engine = 'python')
header = header + ["missing-column"] # In your example file header has only 3 columns but data has 4
# Assign header list as dataframe columns names
df.columns = header
print(df)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将带有多个标题行的csv文件读入pandas? - How to read a csv file with multiple header rows into pandas? 使用 pandas read_csv 检测导入 csv 文件的标头分隔符 - Detect header delimiters importing a csv file using pandas read_csv 如何将 csv 文件读入熊猫,跳过行直到某个字符串,然后选择第一行作为标题和分隔符作为 | - How to read csv file into pandas, skipping rows until a certain string, then selecting first row after as header and delimiter as | How to read every column of a csv file in python after every 10-15 rows which have the same header using pandas or csv? - How to read every column of a csv file in python after every 10-15 rows which have the same header using pandas or csv? 如何读取 pandas 中缺少分隔符的 csv(或 - 带有其他分隔符) - How to read a csv in pandas with a missing delimiter (or - with additional delimiters) 熊猫:如何从CSV文件读取特定行 - Pandas: How to read specific rows from a CSV file 如何使用python中的pandas读取csv文件的所有行? - How to read all rows of a csv file using pandas in python? 熊猫:如何从CSV或Excel文件中读取行? - Pandas: How to read rows from CSV or Excel file? 熊猫通过正则表达式使用字符串分隔符读取CSV - Pandas Read CSV with string delimiters via regex 忽略 Pandas 中行尾的分隔符读取 csv - Ignore delimiters at end of row in Pandas read csv
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM