简体   繁体   中英

how can i filter a dataframe by some criteria and then save .csv?

Hi guys i started at jupyter notebook few days ago.

I need help, i have a dataframe by panda. something like this

Date    Stock   Company   Volume

01/02    APPL3   Apple     1.000.000

01/02    YUSS    Yusduqs     200.000

01/02    APPL4    Apple      200.000

01/02    DISN    Disney      1.500.000

02/02    APPL3    Apple       100.000

02/02    YUSS    Yusduqs     1.250.000

02/02    DISN     Disney     2.000.000

02/02    APPL4    Apple     1.250.000

 ...            ...           ....

I need to select the stock that was traded in more than 80% of the days with volume greater than $ 500.000,00 per day.

And i need to select **only one stock per firm, the criterio is which has more volume in all days combined. Like for 'Apple' in [Company] i have two diferents [Stock] Appl3 and Appl4, in this specific case i only need APPL4.

(Because Volume of the days combined in Appl4 > Volume of the days combined in Appl3)

I started like this:

unique_dates=len(df['Date'].value_counts()) share_freq=df[df['Volume']>=500000]]['Stock'].value_counts() stocks=share_freq/unique_dates for stock,value in stocks.items(): if(value>0.8): print(stock)

So after that i can see which one has>0.8 but i still need to select only one stock per firm. I dont know how to respect all the criterios and by the end filter all the dataframe by the criterios and save in.csv

We can use nunique

n=df.loc[df['Volume']>=500000,'Date'].groupby(df['Stock']).nunique()
uniquedate=df.Date.nunique()
n=n[n/uniquedate >0.8]
print(n.index)

Update

df.loc[df.Stock.isin(n.index)].to_excel('output.xlsx')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM