August 2017

Saturday, August 5, 2017

Data Visualization using Lda and T-SNE


LDA (linear discriminant analysis)

Linear Discriminant Analysis (LDA) is most commonly used as dimensionality reduction technique in the pre-processing step for pattern-classification and machine learning applications. The goal is to project a dataset onto a lower-dimensional space with good class-separability in order avoid overfitting (“curse of dimensionality”) and also reduce computational costs.

The SNE

The Stochastic Neighbor Embedding (SNE) method consists of converting the large-dimensional Euclidean distances between the data points into conditional probabilities which have the similarities.
For the homologues yi and yj of the data points xi and xi, it is possible to calculate the similar conditional probability denoted Qj / i. The variance is fixed at 1/21/2 




Since we are interested in matched similarities, on met Qj / i = 0. If the points yi and yj correctly model the similarity between the data points, the conditional probabilities are equal. On the basis of this observation, the SNE aims to find a small dimensional presentation that minimizes the offset between Pj / i and Qj / i. But the different types of errors in the corresponding distances are not equally weighted. In particular, the cost remains important for the use of widely separated points.


The T-SNE


The SNE as presented by Hinton and Roweis has good visualizations but hampered by a function of cost difficult to optimize.
The t-SNE is a new technique that aims to alleviate the problem by using a new version of the cost function of symmetric SNE with simpler gradients, and also using the student rather than the Gaussian for computation Of the similarity between two points in small space.



Today we will work with a known dataset called 20_newsgroups.
and we had this lda and T-SNE representation.
LDA Visualization 
T-SNE Visualization

Here you can find the full notebook.