I have a scenario where I have to find the range of all the columns in a dataset which contains multiple columns with numeric value but one column has string values. Please find sample records from my data set below:
import seaborn as sns
iris = sns.load_dataset('iris')
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
The maximum and minimum of these columns are given by
sepal_length 7.9
sepal_width 4.4
petal_length 6.9
petal_width 2.5
species virginica
dtype: object
and
sepal_length 4.3
sepal_width 2
petal_length 1
petal_width 0.1
species setosa
dtype: object
...respectively. To find the range of all the columns I can use the below code:
iris.max() - iris.min()
But as the column 'species' has string values, the above code is throwing the below error:
TypeError: unsupported operand type(s) for -: 'str' and 'str'
If the above error occurs, I want to print the value as the
"{max string value}" - "{min string value}"
IOW, my expected output would be something like:
sepal_length 3.6
sepal_width 2.4
petal_length 5.9
petal_width 2.4
species virginica - setosa
How do I resolve this issue?
Handle the numeric and string columns separately. You can select these using df.select_dtypes
. Finally, concat
the result.
u = Iris.select_dtypes(include=[np.number])
# U = u.apply(np.ptp, axis=0)
U = u.max() - u.min()
v = Iris.select_dtypes(include=[object])
V = v.max() + ' - ' + v.min()
U.append(V)
sepal_length 3.6
sepal_width 2.4
petal_length 5.9
petal_width 2.4
species virginica - setosa
dtype: object
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.