Julio Sotelo
Julio Sotelo
  • About me
    • Bio
    • LinkedIn
    • Triathlete
  • NFL predictions
  • NBA predictions
  • Posts
MENU CLOSE back  

Principal Component Analysis PCA

Many times we face the need to analysis data collections that have a large number of variables. This leads to some problems when performing analysis trying to get information out from the data. When trying to use multivariate regression analysis to describe or predict we may not know which variables to use; even using different approaches as backward or forward selection could be costly. Not only that, it could bring another problem: multicollinearity. Those approaches do not care about the correlation of the selected x’s, making multiple regression difficult. Here is when PCA comes in handy.

Principal Component Analysis helps into identifying commonalities among the variables and grouping them into components that we should interpret hoping to find common sense in them. By doing these we may drop or surrogate variables not just for the statistical benefit of our regression model. We may actually select which variables to use and what to do with those left aside.

PCA is a mathematical approach rather that a statistical one. By using the directions (eigenvectors) and the spread in each direction (eigenvalues) we may rearrange the variables in order to gather the most variance possible in just few components. At least thats what we hope for. So we know that even when we transform the data the meaning and its relation remains.PCA is an interesting a powerful tool, could be use in different steps of data mining. For dimension reduction as it helps us to perform feature extraction; and for pattern discovery as we may use it to describe a phenomena.

Most important & difficult task – Explaining it

We have recently use PCA to describe the interrelation client and customers; we identify that the most challenging part in performing this analysis is making your client able to understand it. So try to tell a coherent story. Not and easy task when trying to explain how merging variables helps into identify patterns in a relationship. Trying to surrogate a single variable may be the best approach, you might loose predictive power, but explaining the phenomena gets easier. If you get an eureka moment in your customer then the adoption process could get closer.

By JulioPCA R
Perform PCA?Next

Related

Amazon-hadoop-fig9
Hadoop for Amazon product co-purchasing network
March 8, 2017
C4 copy
Predicting User Preferences, Creating Online P2P Lending Teams in Kiva
January 23, 2017
Lending-Club-Investor-Account-Review
Lending Club’s Portfolio Analysis
January 7, 2017
u2u-color-by-modularity
Amazon product co-purchasing network
December 2, 2016
food-atlas
Food Atlas
December 1, 2016
kmeans2
PCA, SVD & AR in Python
May 30, 2016
gephi2
Social Network Analysis
May 3, 2016
mlp-1024×925
Machine Learning in Python
May 2, 2016
tsf-vol1-exp3-1024×820
Time Series Analysis Forecasting
March 20, 2016
Categories
  • Association Rule (2)
  • Canonical Correlation (1)
  • Classification (9)
  • Clustering (2)
  • Credit Risk (1)
  • Decision Trees (3)
  • Finance (3)
  • Gephi (2)
  • Hadoop (1)
  • LDA (3)
  • Linear Regression (1)
  • Machine Learning (5)
  • Monte Carlo (4)
  • PCA (7)
  • Python (12)
  • R (9)
  • Random Forest (3)
  • Social Network (4)
  • Sports (2)
  • SVD (2)
  • Thoughts (1)
  • TSA (4)
  • Uncategorized (1)
  • Web crawling (3)
Julio Sotelo

LinkedIn | Twitter