# Increasing Dimensionality of data

14 views (last 30 days)

Show older comments

Here is my question, I am not sure if that can be done at all.

I want to test relation between a property X to dimensionality of the matrix. Saying so, I would like to maintain the original properties of the data as close as possible. So, I thought of following two ways.

1. If I take IRIS data, it has four attributes what i would like to do is to increase the attribute to may be 6 or 12 and so forth. but still have characteristics of original data. I am not sure how to do it.

2. Another thing that might work would be to generate data like 3 Gaussian normal data but with different dimension. Will the data be able to relate to one another? Since, they simply have different dimension.

my question is not how to add extra data in matlab, but how add data still preserving the properties( if that makes sense

I would appreciate any help.

Thank you for looking.

### Accepted Answer

Walter Roberson
on 24 Feb 2011

Features are independent of the dimensionality of the data. The width of the petal of an Iris is not dependent upon how many other size measurements you took or which of them you included.

There may be correlations between features. For example, you are not going to find a very short Iris that has very long petals. These correlations do not, however, depend upon how many other measurements you included.

Be careful also to note that the scale of each feature is independent. For example it might be most natural to measure the size of the pollen in microns but the height of the plant in centimeters. Thus, a large value in one feature might have less significance than a very modest value in another feature. Therefore the scale of values for any newly introduced feature is not relevant: it is the distribution of values that matters.

Introducing new artificial features that are independent of the existing features is not going to help data classification. Done wrong, you can end up making your classification decisions based upon the new artificial feature entirely. Done right, your classification procedure will notice that your new feature contributes no information, and effectively classifies as if it was not there.

Therefor if you introduce new features, they must be dependent upon the existing features in some way (or upon information from features which you have the data for but did not included.)

When we introduce new features in our classifications, it is always for dimension reduction. For example, in a Magnetic Resonance Spectrum (MRS), we might replace hundreds of spectrum data points (each of which would otherwise be a feature) that are mostly overwhelmed by the water signal, substituting something like the mean and standard deviation of the points.

Anyhow, if by "property X" you are referring to individual features, then the thesis that it is related to dimensionality is not true. If, though, you are referring to something like confidence intervals, then you can work that out from the formulae involved, or you can do it experimentally by adding columns that convey no information at all because they are constant for all samples.

##### 0 Comments

### More Answers (1)

Paulo Silva
on 24 Feb 2011

Here's one example, you can adapt it to your needs

a=[1 2 3 4

5 6 7 8]'

b=[a [9 10 11 12]'] %b is a with one more column

c=[a;[9 10]] %c is a with one more line

In your case size(a)=[150 4] and you want to add 2 more lines, example:

a=randn(150,4); %Create an array 150 by 4 with random values

b=(1:150)'; %Create a vector with numbers from 1 to 150

c=2*b; %Create another vector with numbers from 2 to 300

d=[a b c]; %add two more columns to a, 5 column is b and 6 column is c

##### 4 Comments

Paulo Silva
on 24 Feb 2011

### See Also

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!