Just this line:
data = iris[iris['species'] == 'setosa']['sepal_length']
You are interested in the blue line, so the 'setosa'
scpecie. In order to filter the iris
dataframe, I create this filter:
iris['species'] == 'setosa'
which is a boolean array, whose values are True
if the corresponding row in the 'species'
columns of the iris
dataframe is 'setosa'
, False
otherwise. With this line of code:
iris[iris['species'] == 'setosa']
I apply the filter to the dataframe, in order to extract only the rows associated with the 'setosa'
specie. Finally, I extract the 'sepal_length'
column:
iris[iris['species'] == 'setosa']['sepal_length']
If I plot a KDE for this data array with this code:
data = iris[iris['species'] == 'setosa']['sepal_length']
sns.kdeplot(data)
I get:
that is the plot above you are interested in
The values are different from the plot above by the way KDE is calculated.
I quote this reference :
The y-axis in a density plot is the probability density function for the kernel density estimation. However, we need to be careful to specify this is a probability density and not a probability. The difference is the probability density is the probability per unit on the x-axis. To convert to an actual probability, we need to find the area under the curve for a specific interval on the x-axis. Somewhat confusingly, because this is a probability density and not a probability, the y-axis can take values greater than one. The only requirement of the density plot is that the total area under the curve integrates to one. I generally tend to think of the y-axis on a density plot as a value only for relative comparisons between different categories.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.