简体   繁体   English

在熊猫中读取带有额外行的csv文件

[英]Reading csv file with extra line in pandas

I'm trying to use pandas to manipulate a .txt file but I have extraline as shown in the picture below:我正在尝试使用 Pandas 来操作 .txt 文件,但我有如下图所示的 extraline: 在此处输入图片说明

When i read the file当我阅读文件时

import pandas as pd

df=pd.read_csv('movies.txt',sep='|')
print(df)

I got this as an output:我得到了这个作为输出:

                                                          +--------+--------------------------+------------+---------------+
NaN                                                id       MovieNAme                  Year         Author                                                         NaN       

+--------+--------------------------+----------... NaN      NaN                        NaN          NaN                                                            NaN       

NaN                                                1234     once upon deadpool                 2017 Alicia                                                         NaN       

+--------+--------------------------+----------... NaN      NaN                        NaN          NaN                                                            NaN       

NaN                                                1244     avengers: endgame                  2014 John                                                           NaN       

+--------+--------------------------+----------... NaN      NaN                        NaN          NaN                                                            NaN       

NaN                                                1245     The bird King                      2017 Mark                                                           NaN       

+--------+--------------------------+----------... NaN      NaN                        NaN          NaN                                                            NaN   

How can i fix this please and remove this line "---------------------+----------"我该如何解决这个问题并删除这一行“---------------------+---------”

Try:尝试:

df = pd.read_csv(
    "name_of_your_file.txt",
    sep=r"\s*\|\s*",
    comment="+",
    usecols=range(1, 5),
    engine="python",
)
print(df)

Usually, when working with csv we create empty spaces but pandas consider it as a line.通常,在使用 csv 时,我们会创建空白空间,但 Pandas 将其视为一条线。 do it's always better to add DataFrameName.dropna(axis=0, inplace=False) to drop those empty cells or you can just go to csv and do it manually.添加DataFrameName.dropna(axis=0, inplace=False)来删除那些空单元格总是更好,或者您可以直接转到 csv 并手动执行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM