machine learning - PCA in matlab selecting top n components -


i want select top n=10,000 principal components matrix. after pca completed, matlab should return pxp matrix, doesn't!

>> size(train_data) ans =          400      153600  >> [coefs,scores,variances] = pca(train_data); >> size(coefs) ans =       153600         399  >> size(scores) ans =     400   399 >> size(variances) ans =     399     1 

it should coefs:153600 x 153600? , scores:400 x 153600?

when use below code gives me out of memory error::

>> [v d] = eig(cov(train_data)); out of memory. type memory options.  error in cov (line 96)     xy = (xc' * xc) / (m-1); 

i don't understand why matlab returns lesser dimensional matrix. should return error pca: 153600*153600*8 bytes=188 gb

error eigs:

>> eigs(cov(train_data)); out of memory. type memory options.  error in cov (line 96)     xy = (xc' * xc) / (m-1); 

foreword

i think falling prey xy problem, since trying find 153.600 dimensions in data non-physical, please ask problem (x) , not proposed solution (y) in order meaningful answer. use post tell why pca not fit in case. cannot tell solve problem, since have not told is.

this mathematically unsound problem, try explain here.

pca

pca is, user3149915 said, way reduce dimensions. means somewhere in problem have one-hundred-fifty-three-thousand-six-hundred dimensions floating around. that's lot. heck of lot. explaining physical reason existence of of them might bigger problem trying solve mathematical problem.

trying fit many dimensions 400 observations not work, since if observations linear independent vectors in feature space, can still extract 399 dimensions, since rest cannot found since there no observations. can @ fit n-1 unique dimensions through n points, other dimensions have infinite number of possibilities of location. trying fit plane through 2 points: there's line can fit through , third dimension perpendicular line, undefined in rotational direction. hence, left infinite number of possible planes fit through 2 points.

i not think trying fit "noise" after first 400 components, think fitting void after that. used data dimensions , cannot create more dimensions. impossible. can more observations, 1.5m, , pca again.

more observations dimensions

why need more observations dimensions? might ask. easy, cannot fit unique line through point, nor unique plane through 2 points, nor unique 153.600 dimensional hyperplane through 400 points.

so, if 153.600 observations i'm set?

sadly, no. if have 2 points , fit line through 100% fit. no error, jay! done day, let's go home , watch tv! sadly, boss call in next morning since fit rubbish. why? well, if you'd have instance 20 points scattered around, fit not without errors, @ least closer representing actual data, since first 2 outliers, see illustrative figure, red points first 2 observations:

enter image description here

if extract first 10.000 components, that'd 399 exact fits , 9601 0 dimensions. might not attempt calculate beyond 399th dimension, , stick 0 array 10.000 entries.

tl;dr cannot use pca , cannot solve problem long not tell problem is.


Comments

Popular posts from this blog

javascript - Chart.js (Radar Chart) different scaleLineColor for each scaleLine -

apache - Error with PHP mail(): Multiple or malformed newlines found in additional_header -

java - Android – MapFragment overlay button shadow, just like MyLocation button -