select rows from a DataFrame based on column value, limit to 16384 rows

Question

I have a huge *.csv file contains data as the example below, and had load the *.csvfile to a dataframe named as "data"

I want to select the rows that with "CHR" column equals to "1", and my code is as below

selected_row = data.loc[data['CHR'] == '1']

the result of selected_row is correct(row 0/3/6/7/10/13 are selected in the example), however, not containing all the rows with column equals to "1", I finally found selected_row contains rows with CHR=='1' till the 16384 row of data, the 16385 row (and many following rows) of data with CHR=='1' is not selected in selected_row, please advise, thanks.

Answer 1

Try

selected_row = data.loc[data['CHR'].isin([1, '1'])]

Answer 2

i think you have got your filters mixed up to make it more easier for you Now apply the filter to your dataframe #try this

filter_row= data['CHR'] == '1']. #this would return a dataframe with boolean values which you can then use afterwards


```
data.loc[filter_row]

Answer 3

Thanks for everyone. By the way, it is strange that if I specify data type when reading the *.csv file, the problem also disappeared, not really know the reason behind and just for anyone's reference

data = pandas.read_csv("mydata.csv",dtype={"CHR":"string"})

select rows from a DataFrame based on column value, limit to 16384 rows

Question

3 answers

solution1
1 ACCPTED 2021-02-14 10:39:55

solution2
0 2021-02-14 10:45:35

solution3
0 2021-02-16 15:25:47

select rows from a DataFrame based on column value, limit to 16384 rows

Question

3 answers

solution1 1 ACCPTED 2021-02-14 10:39:55

solution2 0 2021-02-14 10:45:35

solution3 0 2021-02-16 15:25:47

solution1
1 ACCPTED 2021-02-14 10:39:55

solution2
0 2021-02-14 10:45:35

solution3
0 2021-02-16 15:25:47