简体   繁体   中英

Python: create a polynomial of degree n

I have a feature set

[x1,x2....xm]

Now I want to create polynomial feature set What that means is that if degree is two, then I have the feature set

[x1.... xm,x1^2,x2^2...xm^2, x1x2, x1x3....x1,xm......xm-1x1....xm-1xm]

So it contains terms of only of order 2.. same is if order is three.. then you will have cubic terms as well..

How to do this?

Edit 1: I am working on a machine learning project where I have close to 7 features... and a non-linear regression on this linear features are giving ok result...Hence I thought that to get more number in features I can map these features to a higher dimension.. So one way is to consider polynomial order of the feature vector... Also generating x1*x1 is easy.. :) but getting the rest of the combinations are a bit tricky..

Can combinations give me x1x2x3 result if the order is 3?

Use

itertools.combinations(list, r)

where list is the feature set, and r is the order of desired polynomial features. Then multiply elements of the sublists given by the above. That should give you {x1*x2, x1*x3, ...} . You'll need to construct other ones, then union all parts.

[Edit] Better: itertools.combinations_with_replacement(list, r) will nicely give sorted length-r tuples with repeated elements allowed.

You could use itertools.product to create all the possible sets of n values that are chosen from the original set; but keep in mind that this will generate (x2, x1) as well as (x1, x2) .

Similarly, itertools.combinations will produce sets without repetition or re-ordering, but that means you won't get (x1, x1) for example.

What exactly are you trying to do? What do you need these result values for? Are you sure you do want those x1^2 type terms (what does it mean to have the same feature more than once)? What exactly is a "feature" in this context anyway?

Using Karl's answer as inspiration, try using product and then taking advantage of the set object. Something like,

set([set(comb) for comb in itertools.product(range(5),range(5)])

This will get rid of recurring pairs. Then you can turn the set back into a list and sort it or iterate over it as you please.

EDIT: this will actually kill the x_m^2 terms, so build sorted tuples instead of sets. this will allow the terms to be hashable and nonrepeating.

set([tuple(sorted(comb)) for comb in itertools.product(range(5),range(5))])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM