I am trying to find some efficient function to generate a new feature matrix consisting of all polynomial combinations of an original matrix in Julia. For example, if I have a dataframe/matrix below:
x | y |
---|---|
1 | 3 |
2 | 4 |
Then the dataframe/matrix that I want to generate is:
x | y | x^2 | y^2 | x*y |
---|---|---|---|---|
1 | 3 | 1 | 9 | 3 |
2 | 4 | 4 | 16 | 8 |
The function that gives the same output in Python would be:
mpoly = PolynomialFeatures(degree=degree,include_bias=False)
x = mpoly.fit_transform(df[varnm].values)
Currently, in Julia, I created and used a function below:
function poly(data, vars, deg)
df = data
mat = Matrix(df[:,vars])
if deg == 1
p = mat
varnm = vars
end
if deg == 2
(n, s) = size(mat)
p = mat
varnm = vars
for i in 1:s
for j in i:s
p = hcat(p,mat[:,i].*mat[:,j])
if vars[i] == vars[j]
varnm = vcat(varnm,vars[i]*"_2")
else
varnm = vcat(varnm,vars[i]*"_"*vars[j])
end
end
end
end
return p, varnm
end
It gives me what I intended but it is extremely slow... Does anyone know an efficient function for this? or how to make the current function more efficient? Thanks!
Since the DataFrames.jl is written in a optimal and efficient way, I rather suggesting a workaround utilizing its functions. What about using transform
function for that:
julia> df = DataFrame(x=[1, 2], y=[3, 4])
2×2 DataFrame
Row │ x y
│ Int64 Int64
─────┼──────────────
1 │ 1 3
2 │ 2 4
julia> function op(vecs...)
power2 = broadcast(x->x.^2, vecs)
return hcat(power2..., .*(vecs...))
end
julia> transform!(df, All() => op => ["x^2", "y^2", "x*y"])
2×5 DataFrame
Row │ x y x^2 y^2 x*y
│ Int64 Int64 Int64 Int64 Int64
─────┼───────────────────────────────────
1 │ 1 3 1 9 3
2 │ 2 4 4 16 8
Or, using the hcat
function:
julia> df = DataFrame(x=[1, 2], y=[3, 4]);
julia> df = hcat(df, df.^2, .*(eachcol(df)...), makeunique=true)
2×5 DataFrame
Row │ x y x_1 y_1 x1
│ Int64 Int64 Int64 Int64 Int64
─────┼───────────────────────────────────
1 │ 1 3 1 9 3
2 │ 2 4 4 16 8
julia> rename!(df, "x_1" => "x^2", "y_1" => "y^2", "x1" => "x*y")
2×5 DataFrame
Row │ x y x^2 y^2 x*y
│ Int64 Int64 Int64 Int64 Int64
─────┼───────────────────────────────────
1 │ 1 3 1 9 3
2 │ 2 4 4 16 8
But if you prefer operating on and achieving arrays:
julia> mat = [1;2;;3;4]
2×2 Matrix{Int64}:
1 3
2 4
julia> hcat(mat, mat.^2, .*(eachcol(mat)...))
2×5 Matrix{Int64}:
1 3 1 9 3
2 4 4 16 8
The results are the same.
Update:
But I have one problem: I tried with data of more than two columns (ex. three columns: x1, x2, x3), but.*(eachcol(mat)...) gives me (x1.*x2.*x3), not (x1.*x2), (x1.*x3), (x2.*x3). And the latter one was what I intended.
Then there should be a few adjustments:
julia> function op(vecs...)
power2 = broadcast(x->x.^2, vecs)
prods = map(
(idx)->vecs[idx[1]] .* vecs[idx[2]],
combinations(1:length(vecs), 2)
)
return hcat(power2..., prods...)
end;
julia> df = DataFrame(x=[1, 2], y=[3, 4], z=[2, 1])
2×3 DataFrame
Row │ x y z
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 3 2
2 │ 2 4 1
julia> transform!(df, All() => op => ["x^2", "y^2", "z^2", "x*y", "x*z", "y*z"])
2×9 DataFrame
Row │ x y z x^2 y^2 z^2 x*y x*z y*z
│ Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64
─────┼───────────────────────────────────────────────────────────────
1 │ 1 3 2 1 9 4 3 2 6
2 │ 2 4 1 4 16 1 8 2 4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.