[英]How to read a csv file with Pandas with more delimiters in the rows that in the header?
I want to read a csv file using read_csv function from Pandas, the file has more delimiters in the rows that in the header. 我想使用Pandas的read_csv函数读取一个csv文件,该文件在行中的分隔符比标题中的更多。 Pandas thinks the first columns are multi-index.
熊猫认为第一列是多索引。 The 'NAME' column can have an arbitrary number of delimiters and the affected column could be any one (we do not know which one is affected), even more than one.
“ NAME”列可以有任意数量的定界符,并且受影响的列可以是任意一个(我们不知道哪个受影响),甚至可以超过一个。
I have tried to tune the key-word arguments of read_csv without success. 我试图调整read_csv的关键字参数而没有成功。 I am using Python 3.7.0 and Pandas 0.25.0.
我正在使用Python 3.7.0和Pandas 0.25.0。 However, Excel can read the file correctly.
但是,Excel可以正确读取文件。
import pandas
with open('test.csv', mode='w') as csv_file:
csv_file.write('A,NAME,B\n')
csv_file.write('a, Peter, Parker, b\n')
df = pandas.read_csv('test.csv', header=0, delimiter=',')
print(df)
Expected output: 预期产量:
A NAME B
0 a Peter, Parker b
Actual output: 实际输出:
A NAME B
a Peter Parker b
Other example: 其他例子:
import pandas
with open('test.csv', mode='w') as csv_file:
csv_file.write('A,NAME,B,PLACE\n')
csv_file.write('a, Peter, Parker, b, Queens, New York City\n')
df = pandas.read_csv('test.csv', header=0, delimiter=',')
print(df)
Expected output: 预期产量:
A NAME B PLACE
0 a Peter, Parker b Queens, New York City
Actual output: 实际输出:
A NAME B PLACE
a Peter Parker b Queens New York City
Isn't something like 是不是像
df = pandas.read_csv('test.csv', header=0, delimiter=',')
df = df.reset_index()
df["NAME"] = df["A"] + ", " + df["NAME"]
df["A"] = df["Unnamed: 0"]
df = df.drop("Unnamed: 0", axis=1)
possible ? 可能吗? It is not completely answering this question but could do the trick for your df.
它不能完全回答这个问题,但是可以解决您的df问题。
EDIT : Another possibility, if the file is also available in .xls/.xlsx format, pd.read_excel("name.xls")
should solve your problem 编辑:另一种可能性,如果该文件也可用.xls / .xlsx格式,则
pd.read_excel("name.xls")
应该可以解决您的问题
A workaround: 解决方法:
with open('test.csv', mode='w') as csv_file:
csv_file.write('A,NAME,B\n')
csv_file.write('a, Peter, Parker, b\n')
csv_file.write('aa, John, Lee, Mary, bb\n')
df=pd.DataFrame(columns=["A","NAMES","B"])
with open("test.csv") as ff:
for line in ff:
A,N= line.split(",",maxsplit=1)
N,B= N.rsplit(",",maxsplit=1)
df.loc[len(df.index)]= [A.strip(),N.strip(),B.strip()]
df.drop(0,axis="index")
A NAMES B
1 a Peter, Parker b
2 aa John, Lee, Mary bb
# Read first line as a list using ',' as delimiter
with open('test.csv', 'r') as f:
header = f.readline().replace('\n', '').split(',')
# Read file skipping first line (header) using two character delimiter ', '
df = pandas.read_csv('test.csv', skiprows = 1, header = None, delimiter=', ', engine = 'python')
header = header + ["missing-column"] # In your example file header has only 3 columns but data has 4
# Assign header list as dataframe columns names
df.columns = header
print(df)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.