简体   繁体   中英

How to read column one by one in python pandas?

I have read file from URL as follows :

    url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

    names = ['sepal length', 'sepal width', 'petal length', 'petal width',  'class']

    data = pd.read_csv(url, names=names)

    print(data.shape)

    print(data)

Now, I want to read one column and do some processing (may be min, max, or std dev, r score etc) and then again read another column and do some processing.

Is there any way to do it in scikit learn/pandas/python?

You can use describe :

data.describe()

Output:

       sepal length  sepal width  petal length  petal width
count    150.000000   150.000000    150.000000   150.000000
mean       5.843333     3.054000      3.758667     1.198667
std        0.828066     0.433594      1.764420     0.763161
min        4.300000     2.000000      1.000000     0.100000
25%        5.100000     2.800000      1.600000     0.300000
50%        5.800000     3.000000      4.350000     1.300000
75%        6.400000     3.300000      5.100000     1.800000
max        7.900000     4.400000      6.900000     2.500000

Or a single column:

data['petal length'].describe()

Output:

count    150.000000
mean       3.758667
std        1.764420
min        1.000000
25%        1.600000
50%        4.350000
75%        5.100000
max        6.900000
Name: petal length, dtype: float64

Or you can use apply with lambda to do some custom processing by columns.

data.apply(lambda x: x.describe())

Output:

        sepal length  sepal width  petal length  petal width        class
25%         5.100000     2.800000      1.600000     0.300000          NaN
50%         5.800000     3.000000      4.350000     1.300000          NaN
75%         6.400000     3.300000      5.100000     1.800000          NaN
count     150.000000   150.000000    150.000000   150.000000          150
freq             NaN          NaN           NaN          NaN           50
max         7.900000     4.400000      6.900000     2.500000          NaN
mean        5.843333     3.054000      3.758667     1.198667          NaN
min         4.300000     2.000000      1.000000     0.100000          NaN
std         0.828066     0.433594      1.764420     0.763161          NaN
top              NaN          NaN           NaN          NaN  Iris-setosa
unique           NaN          NaN           NaN          NaN            3

some dummy data

data = pd.DataFrame({'sepal length' : np.random.randn(3), 'sepal width' : np.random.randn(3)})

if you want some customize calculations on all the columns one by one then you can apply for loop on the column names as

>>>for col in data.columns:
            print(col)
            print(np.mean(data[col]))
[out]: 'sepal length'
        -1.06206436799
       'sepal width'
       -0.586939385059

If you are importing data in pandas dataframe then this would be the output.You can also include your customise operations on columns in the loop.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM