I have read file from URL as follows :
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
names = ['sepal length', 'sepal width', 'petal length', 'petal width', 'class']
data = pd.read_csv(url, names=names)
print(data.shape)
print(data)
Now, I want to read one column and do some processing (may be min, max, or std dev, r score etc) and then again read another column and do some processing.
Is there any way to do it in scikit learn/pandas/python?
You can use describe
:
data.describe()
Output:
sepal length sepal width petal length petal width
count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.054000 3.758667 1.198667
std 0.828066 0.433594 1.764420 0.763161
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000
Or a single column:
data['petal length'].describe()
Output:
count 150.000000
mean 3.758667
std 1.764420
min 1.000000
25% 1.600000
50% 4.350000
75% 5.100000
max 6.900000
Name: petal length, dtype: float64
Or you can use apply
with lambda to do some custom processing by columns.
data.apply(lambda x: x.describe())
Output:
sepal length sepal width petal length petal width class
25% 5.100000 2.800000 1.600000 0.300000 NaN
50% 5.800000 3.000000 4.350000 1.300000 NaN
75% 6.400000 3.300000 5.100000 1.800000 NaN
count 150.000000 150.000000 150.000000 150.000000 150
freq NaN NaN NaN NaN 50
max 7.900000 4.400000 6.900000 2.500000 NaN
mean 5.843333 3.054000 3.758667 1.198667 NaN
min 4.300000 2.000000 1.000000 0.100000 NaN
std 0.828066 0.433594 1.764420 0.763161 NaN
top NaN NaN NaN NaN Iris-setosa
unique NaN NaN NaN NaN 3
some dummy data
data = pd.DataFrame({'sepal length' : np.random.randn(3), 'sepal width' : np.random.randn(3)})
if you want some customize calculations on all the columns one by one then you can apply for loop on the column names as
>>>for col in data.columns:
print(col)
print(np.mean(data[col]))
[out]: 'sepal length'
-1.06206436799
'sepal width'
-0.586939385059
If you are importing data in pandas dataframe then this would be the output.You can also include your customise operations on columns in the loop.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.