简体   繁体   中英

count number of rows in a dataframe with conditions

I have some issues in a code where i want to fill a dataframe, depending on another one. To explain, in a dataframe I have replacements of components classified with codes to know their specific emplacements. I want to be able to count how many replacement I have and put this number in another dataframe. this part of my code looks like that:

import plotly.express as px

import pandas as pd

import numpy as np

#import excel from database

d=pd.read_excel("replacements.xlsx")
df=pd.DataFrame(d)

#we create 3 dataframe to put respectively number of replacements, percentages and failures rates.Here, we focus on the number of replacements, because it will be another process to fill the others.

tab_nb_replacements=pd.DataFrame(columns=['electrical auxiliary power supply','process monitoring','wind turbine system','generator system','transmission of electrical energy','structures connected to production','auxiliary systems'], index=['falaise_nb_replacements',...,'quittebeuf_nb_replacements])
As you can see, only some ligns are presented. Below, i fill with zero all the index 'falaise_nb_replacements' with 0 (I did it also for all indexes).
tab_nb_replacements['electrical auxiliary power supply']['falaise_nb_replacements']=np.where(((df['RDSPP code'].str[:1]=='B') & (df['WTName']=='Falaise')),tab_nb_replacements['electrical auxiliary power supply']['falaise_nb_replacements']+1,tab_nb_replacements['electrical auxiliary power supply']['falaise_nb_replacements'])

########### I tried different ways to obtain the number of replacements ######

##NOTE: for the site falaise, we want to select a lign when the value in the column 'RDSPP code' starts with 'B' and when the value in the column 'WTName' is 'Falaise'.

##first method

tab_nb_replacements['electrical auxiliary power supply']['falaise_nb_replacements']=(df[df['RDSPP code'].str[:1]=='B' and df['WTName']=='Falaise']).count()

#second method

"ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."

#third method

tab_nb_replacements['electrical auxiliary power supply']['falaise_nb_replacements']=(df[df['RDSPP code'].str[:1]=='B' and df['WTName']=='Falaise']).count()

Any of these methods gave me results. Indeed with these methods, I obtain:

 "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."

If anybody has a solution or some advices on it, it will be really helpful!

Bests,

For third method you mentioned df['WTName']=='Falaise' will give error, because df['WTName'] 's data type is pandas.series and you can't compare it with string . So you must cast it to string like below:

df['WTName'].astype(str) == 'Falaise'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM