pandas read_csv and keep only certain rows (python)

Question

I am aware of the skiprows that allows you to pass a list with the indices of the rows to skip. However, I have the index of the rows I want to keep.

Say that my cvs file looks like this for millions of rows:

The list of indices i would like to load are only 2,3, so

index_list = [2,3]

The input for the skiprows function would be [0,1,4]. However, I only have available [2,3].

I am trying something like:

pd.read_csv(path, skiprows = ~index_list)

but no luck.. any suggestions?

thank and I appreciate all the help,

Answer 1

You can pass in a lambda function in the skiprows argument. For example:

rows_to_keep = [2,3]
pd.read_csv(path, skiprows = lambda x: x not in rows_to_keep)

You can read more about it in the documentation here

Answer 2

I think you would need to find the number of lines first, like this .

num_lines = sum(1 for line in open('myfile.txt'))

Then you would need to delete the indices of index_list :

to_exclude = [i for i in num_lines if i not in index_list]

and then load your data:

pd.read_csv(path, skiprows = to_exclude)

Answer 3

Another simple solution to this could be to call .loc right after read_csv . Something like this

index_to_keep = [2, 4]
pd.read_csv(path).loc[index_to_keep]

Note: This is a slower approach, as here the entire file will be first loaded in the memory and then only seleted rows will be selected.

pandas read_csv and keep only certain rows (python)

Question

3 answers

solution1
13 2019-03-23 15:50:36

solution2
9 ACCPTED 2016-09-06 02:14:44

solution3
0 2022-07-06 09:43:40

pandas read_csv and keep only certain rows (python)

Question

3 answers

solution1 13 2019-03-23 15:50:36

solution2 9 ACCPTED 2016-09-06 02:14:44

solution3 0 2022-07-06 09:43:40

solution1
13 2019-03-23 15:50:36

solution2
9 ACCPTED 2016-09-06 02:14:44

solution3
0 2022-07-06 09:43:40