简体   繁体   中英

select rows from a DataFrame based on column value, limit to 16384 rows

I have a huge *.csv file contains data as the example below, and had load the *.csvfile to a dataframe named as "data"

在此处输入图像描述

I want to select the rows that with "CHR" column equals to "1", and my code is as below

selected_row = data.loc[data['CHR'] == '1']

the result of selected_row is correct(row 0/3/6/7/10/13 are selected in the example), however, not containing all the rows with column equals to "1", I finally found selected_row contains rows with CHR=='1' till the 16384 row of data, the 16385 row (and many following rows) of data with CHR=='1' is not selected in selected_row, please advise, thanks.

Try

selected_row = data.loc[data['CHR'].isin([1, '1'])]

i think you have got your filters mixed up to make it more easier for you Now apply the filter to your dataframe #try this

filter_row= data['CHR'] == '1']. #this would return a dataframe with boolean values which you can then use afterwards


```
data.loc[filter_row]

Thanks for everyone. By the way, it is strange that if I specify data type when reading the *.csv file, the problem also disappeared, not really know the reason behind and just for anyone's reference

data = pandas.read_csv("mydata.csv",dtype={"CHR":"string"})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM