I have plotted Biplot in Matlab and have created it using fortran in the past. Last month, while playing with PCA, needed to plot biplots in python. Unlike MATLAB, there is no straight forward implementation of biplot in python, so wrote a simple python function to plot it given score and coefficients from a principal component analysis.
Here’s the function.
def biplot(score,coeff,pcax,pcay,labels=None): pca1=pcax-1 pca2=pcay-1 xs = score[:,pca1] ys = score[:,pca2] n=score.shape[1] scalex = 1.0/(xs.max()- xs.min()) scaley = 1.0/(ys.max()- ys.min()) plt.scatter(xs*scalex,ys*scaley) for i in range(n): plt.arrow(0, 0, coeff[i,pca1], coeff[i,pca2],color='r',alpha=0.5) if labels is None: plt.text(coeff[i,pca1]* 1.15, coeff[i,pca2] * 1.15, "Var"+str(i+1), color='g', ha='center', va='center') else: plt.text(coeff[i,pca1]* 1.15, coeff[i,pca2] * 1.15, labels[i], color='g', ha='center', va='center') plt.xlim(-1,1) plt.ylim(-1,1) plt.xlabel("PC{}".format(pcax)) plt.ylabel("PC{}".format(pcay)) plt.grid()
Plotted using
biplot(score,pca.components_,1,2,labels=categories)
What is Biplot?
Biplot is one of the most useful and versatile methods of multivariate data visualisation. The bipolar extends the idea of a simple scatter plot of two variables to the case of many variables, with the objective of visualising the maximum possible information in the data.
From wikipedia
A biplot allows information on both samples and variables of a data matrix to be displayed graphically. Samples are displayed as points while variables are displayed either as vectors, linear axes or nonlinear trajectories.
If you would like to dig deeper, here’s a link on a comprehensive introduction to Biplots [PDF].
Pingback: Biplot in Python – Optimized with Color Scatter Plot – the IO about Hong Wu
Pingback: Biplot in Python revisited. | SukhbinderSingh.com
What is `score`?
LikeLike
If you answer it here I’ll mark it correct. http://stackoverflow.com/questions/39216897/how-to-plot-pca-loadings-and-loading-label-like-rs-autoplot-w-matplotli
LikeLike
Pingback: Using Python for Data Mining - open source for you
I think there is a mistake in components references ( like coeff [ i , pca1 ] ), since you want to iterate over the variables but you do it over the components. It should be coeff [ pca1, i ]
Please refer to sklearn documentation ( http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html ) -> components_ : array, shape (n_components, n_features)
LikeLike
Thanks Martin. I will look into it. Kind regards
LikeLike