简体   繁体   English

如何在使用 read_csv 时删除不需要的行

[英]How to delete non-needed rows while using read_csv

I have a csv file reads like this in.txt:我有一个 csv 文件,在.txt 中读取如下:

FullName: Ryan, Jack
DOB:12345
IDNUM: 1234455
Name,Age,City
jack,34,Sydeny,
Riti,31,Delhi,
Aadi,16,New York,
Suse,32,Lucknow,
Mark,33,Las vegas,
Suri,35,Patna,

Is there anyway to remove the first three rows of the csv file (FullName, DOB, and IDNUM) using read_csv()?无论如何使用 read_csv() 删除 csv 文件(全名、DOB 和 IDNUM)的前三行? I used header=3 and skiprows=3 and the resulting table is not what I am looking for and appears to be shifted off by one to the left.我使用了 header=3 和 skiprows=3 ,结果表不是我想要的,并且似乎向左移动了一个。 Any help would be appreciated.任何帮助,将不胜感激。

As suggested by @xyzjayne, skiprows is the way to go.正如@xyzjayne 所建议的, skiprows是通往 go 的方式。

import pandas as pd
import io

myfile = """FullName: Ryan, Jack
DOB:12345
IDNUM: 1234455
Name,Age,City
jack,34,Sydeny,
Riti,31,Delhi,
Aadi,16,New York,
Suse,32,Lucknow,
Mark,33,Las vegas,
Suri,35,Patna,"""

fh = io.StringIO(myfile)

Let's read it using pandas' read_csv() method:让我们使用 pandas 的read_csv()方法来阅读它:

df = pd.read_csv(fh,
                 skiprows=3,
                 usecols=range(3)
                )

NOTE: since your lines end with a comma, which would generate an extra column with NaN values, you also need usecols to return only a subset of the columns.注意:由于您的行以逗号结尾,这将生成一个带有NaN值的额外列,因此您还需要usecols来仅返回列的子集。

>>> print(df)

   Name  Age       City
0  jack   34     Sydeny
1  Riti   31      Delhi
2  Aadi   16   New York
3  Suse   32    Lucknow
4  Mark   33  Las vegas
5  Suri   35      Patna

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM