[英]Pandas adding empty row below the header of csv file?
I have the following code to create csv files out of tables from a docx file:我有以下代码可以从 docx 文件的表格中创建 csv 文件:
from docx import Document
import pandas as pd
document = Document('my_docx.docx')
for index,table in enumerate(document.tables):
df = [['' for i in range(len(table.columns))] for j in range(len(table.rows))]
for i, row in enumerate(table.rows):
for j, cell in enumerate(row.cells):
df[i][j] = cell.text
name = "tables/table_"+str(index)+".csv"
pd.DataFrame(df).to_csv(name, index=False, header=True)
However, Pandas is creating an undesired empty row just below the header of the table.但是,Pandas 在表格标题下方创建了一个不需要的空行。 This only happens when
header=True
, when header=False
this problem does not occur.这仅在
header=True
时发生,当header=False
时不会发生此问题。 But I need the header.但我需要标题。
There's two ways I can solve this, I believe.我相信有两种方法可以解决这个问题。 Create the csv file with
header=True
and then delete the row or use header=False
and then add a row to be the header.使用
header=True
创建 csv 文件,然后删除该行或使用header=False
然后添加一行作为标题。 How can I do either one of these?我怎样才能做到其中之一?
Use df.drop(0, inplace=True)
or df=df.iloc[1:]
to remove first row of dataframe.使用
df.drop(0, inplace=True)
或df=df.iloc[1:]
删除数据帧的第一行。
So the overall code would be :-所以整体代码将是:-
from docx import Document
import pandas as pd
document = Document('my_docx.docx')
for index,table in enumerate(document.tables):
df = [['' for i in range(len(table.columns))] for j in range(len(table.rows))]
for i, row in enumerate(table.rows):
for j, cell in enumerate(row.cells):
df[i][j] = cell.text
name = "tables/table_"+str(index)+".csv"
dataFrame=pd.DataFrame(df)
dataFrame.drop(0, inplace=True) # Or use dataFrame=dataFrame.iloc[1:]
dataFrame.to_csv(name, index=False, header=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.