简体   繁体   中英

Create a new feature matrix consisting of all polynomial combinations in Julia

I am trying to find some efficient function to generate a new feature matrix consisting of all polynomial combinations of an original matrix in Julia. For example, if I have a dataframe/matrix below:

x y
1 3
2 4

Then the dataframe/matrix that I want to generate is:

x y x^2 y^2 x*y
1 3 1 9 3
2 4 4 16 8

The function that gives the same output in Python would be:

mpoly =  PolynomialFeatures(degree=degree,include_bias=False)    
x = mpoly.fit_transform(df[varnm].values)

Currently, in Julia, I created and used a function below:

function poly(data, vars, deg)
    df = data
    mat = Matrix(df[:,vars])
    if deg == 1
        p = mat
        varnm = vars
    end
    if deg == 2
        (n, s) = size(mat)
        p = mat
        varnm = vars
        for i in 1:s
            for j in i:s
                p = hcat(p,mat[:,i].*mat[:,j])
                if vars[i] == vars[j]
                    varnm = vcat(varnm,vars[i]*"_2")
                else
                    varnm = vcat(varnm,vars[i]*"_"*vars[j])
                end
            end
        end
    end
    return p, varnm
end

It gives me what I intended but it is extremely slow... Does anyone know an efficient function for this? or how to make the current function more efficient? Thanks!

Since the DataFrames.jl is written in a optimal and efficient way, I rather suggesting a workaround utilizing its functions. What about using transform function for that:

julia> df = DataFrame(x=[1, 2], y=[3, 4])
2×2 DataFrame
 Row │ x      y     
     │ Int64  Int64 
─────┼──────────────
   1 │     1      3
   2 │     2      4

julia> function op(vecs...)
         power2 = broadcast(x->x.^2, vecs)
         return hcat(power2..., .*(vecs...))
       end

julia> transform!(df, All() => op => ["x^2", "y^2", "x*y"])
2×5 DataFrame
 Row │ x      y      x^2    y^2    x*y   
     │ Int64  Int64  Int64  Int64  Int64 
─────┼───────────────────────────────────
   1 │     1      3      1      9      3
   2 │     2      4      4     16      8

Or, using the hcat function:

julia> df = DataFrame(x=[1, 2], y=[3, 4]);

julia> df = hcat(df, df.^2, .*(eachcol(df)...), makeunique=true)
2×5 DataFrame
 Row │ x      y      x_1    y_1    x1    
     │ Int64  Int64  Int64  Int64  Int64 
─────┼───────────────────────────────────
   1 │     1      3      1      9      3
   2 │     2      4      4     16      8

julia> rename!(df, "x_1" => "x^2", "y_1" => "y^2", "x1" => "x*y")
2×5 DataFrame
 Row │ x      y      x^2    y^2    x*y   
     │ Int64  Int64  Int64  Int64  Int64 
─────┼───────────────────────────────────
   1 │     1      3      1      9      3
   2 │     2      4      4     16      8

But if you prefer operating on and achieving arrays:

julia> mat = [1;2;;3;4]
2×2 Matrix{Int64}:
 1  3
 2  4

julia> hcat(mat, mat.^2, .*(eachcol(mat)...))
2×5 Matrix{Int64}:
 1  3  1   9  3
 2  4  4  16  8

The results are the same.

Update:

But I have one problem: I tried with data of more than two columns (ex. three columns: x1, x2, x3), but.*(eachcol(mat)...) gives me (x1.*x2.*x3), not (x1.*x2), (x1.*x3), (x2.*x3). And the latter one was what I intended.

Then there should be a few adjustments:

julia> function op(vecs...)
         power2 = broadcast(x->x.^2, vecs)
         prods = map(
           (idx)->vecs[idx[1]] .* vecs[idx[2]],
           combinations(1:length(vecs), 2)
         )

         return hcat(power2..., prods...)
       end;

julia> df = DataFrame(x=[1, 2], y=[3, 4], z=[2, 1])
2×3 DataFrame
 Row │ x      y      z     
     │ Int64  Int64  Int64 
─────┼─────────────────────
   1 │     1      3      2
   2 │     2      4      1

julia> transform!(df, All() => op => ["x^2", "y^2", "z^2", "x*y", "x*z", "y*z"])
2×9 DataFrame
 Row │ x      y      z      x^2    y^2    z^2    x*y    x*z    y*z   
     │ Int64  Int64  Int64  Int64  Int64  Int64  Int64  Int64  Int64 
─────┼───────────────────────────────────────────────────────────────
   1 │     1      3      2      1      9      4      3      2      6
   2 │     2      4      1      4     16      1      8      2      4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM