Skip to content
May 26, 2008 / jphoward

Implementing Directed Random Projection in J

The J language is a great system for prototyping algorithms, especially multi-dimensional algorithms, since it is compact, implements many important mathematical operations, and implicitly scales to multiple dimensions. For this reason, I used this to prototype Directed Random Projection.

Some useful verbs

We start by defining some useful verbs (what other languages call ‘methods’ or ‘functions’, J calls ‘verbs’):

r=: ?@$ 0:
ri =: {~ [:?#
mean =: +/ % #
ssd =: +/ @: *: @: -"1
var=: 3 : ‘%: +/ (ssd mean) y’
mp =: +/ . *

Let’s look at these lines a few at a time:

r=: ?@$ 0:
ri =: {~ [:?#
mean =: +/ % #

r is the standard J verb to return y random numbers in the interval (0,1). (‘y’ is the name of a single argument to a J verb).  ri finds a random integer between 0 and the count of items of y (?#), and then uses this to select an element from y ({~, used as a hook). mean is the standard definition of the mean (defined as a fork).

ssd =: +/ @: *: @: -"1
var=: 3 : ‘%: +/ (ssd mean) y’

ssd is a dyad (i.e. a function of 2 arguments) which finds the sum of the square of the differences between y and x. It uses the rank conjunction "1 to allow it to apply row-wise. var finds the variance of a dataset. In it, the fork (ssd mean) finds the sum of the square of the differences between each row and the mean of all rows, then we take the root of the sum (%: +/). (In hindsight, actually this is a standard deviation, not a variance! It’s not actually used in the projection algorithm – it’s just a useful measure to decide how many components to keep.)

mp =: +/ . *

Finally, mp is the standard definition of a matrix product.

Next, let’s create a small dataset:

x=:4*r 25
n =: #x
y=:x + 1+ r n
z=: x+y+r n
d=:x,.y,.z

x is just some random numbers in (0,4), y is (x+1) plus some (0,1) noise, and z is x+y plus noise. d puts these together into a dataset.

The main algorithm

Now we’re ready to look at the actual directed random projection algorithm:

pca =: 3 : 0
   ‘r y’=.y
   sn=. >. (<. (1.2 & ^.)) n
   s=. y {~ sn ? n
   maxidx =. i. >./
   furthest =. [{~ [: maxidx ssd
   nrm =: [: %: [: +/ *:
   v=. (% nrm) -/ s furthest^:(1 2) ri s
   pj =. y mp v
   res =: (r&,. ` ] @. (r-:”)) v
   res ; y + (pj*" 0 1 -v)
)

We will be calling the verb like so:

pca ”;d

We are calling it with a list of two boxed items: the first is a list of all projections found so far (initially the empty list: ”), and the dataset to find the next projection of (initially d).

As before, let’s look at these lines a few at a time:

   ‘r y’=.y
   sn=. >. (<. (1.2 & ^.)) n
   s=. y {~ sn ? n

The algorithm first unpacks theses arguments into ‘r’ and ‘y’ (‘r y’=.y). Next we calculate the value sn, which is either log base 1.2 of n (rounded up to the nearest integer), or n itself, whichever is smaller. We use this later as the number of sample points when calculating the extreme elements (this was based on some trial and error, not on any rigorous analysis). We also create the sample s, which contains sn points randomly selected from y.

   maxidx =. i. >./
   furthest =. [{~ [: maxidx ssd

maxidx is a verb which returns the index (i.) of the maximum of a list (>./). It is a good example of the utility of the ‘hook’ conjunction in J. Next up is furthest, which is a dyad which returns the item with the maxidx of the sum of squares of differences (ssd) – that is, the element of x furthest from a particular point y.

   nrm =: [: %: [: +/ *:

The only remaining utility function we need before we can calculate the projection is nrm, which is simply root of the sum of the squares (i.e. the norm of a vector).

   v=. (% nrm) -/ s furthest^:(1 2) ri s

We can now go ahead and find the projection vector, v, which (reading right to left) is the furthest point from some random element of the sample s (s furthest ri s), and the furthest point from that point (s furthest^:2 ri s). We use the power adverb with a list argument to construct these 2 points at once. Finally, we take the difference of the points to calculate the vector (-/), and divide it by its norm using the hook (% nrm).

   pj =. y mp v
   res =: (r&,. ` ] @. (r-:”)) v
   res ; y + (pj*" 0 1 -v)

We now have the projection vector v. We project the data using it (y mp v) , and add it to our list of projection vectors (using the rather clumsy  (r&,. ` ] @. (r-:”)) v). We return the new projection vectors, along with the data after subtracting out v (res ; y + (pj*" 0 1 -v)).

Output

We can find all the projection vectors, and show how the variance of the data changes with each, by running:

res =: pca ^: 3 ”;d
]pc =: >0{res
]cumvar =: ([: var d mp |:)\ |:pc

pc contains the principal components, by simply extracting and unboxing the relevant item of res. We then show the variance of projecting d using pc (var d mp pc) applied to each prefix (\) of the transpose of pc (|:pc). That is, cumvar contains the variance of just the first component, then the first 2, then all 3.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: