biplot in python

I have plotted Biplot in Matlab and have created it using fortran in the past. Last month, while playing with PCA, needed to plot biplots in python. Unlike MATLAB, there is no straight forward implementation of biplot in python, so wrote a simple python function to plot it given score and coefficients from a principal component analysis.

Here’s the function.


def biplot(score,coeff,pcax,pcay,labels=None):
    pca1=pcax-1
    pca2=pcay-1
    xs = score[:,pca1]
    ys = score[:,pca2]
    n=score.shape[1]
    scalex = 1.0/(xs.max()- xs.min())
    scaley = 1.0/(ys.max()- ys.min())
    plt.scatter(xs*scalex,ys*scaley)
    for i in range(n):
        plt.arrow(0, 0, coeff[i,pca1], coeff[i,pca2],color='r',alpha=0.5)
        if labels is None:
            plt.text(coeff[i,pca1]* 1.15, coeff[i,pca2] * 1.15, "Var"+str(i+1), color='g', ha='center', va='center')
        else:
            plt.text(coeff[i,pca1]* 1.15, coeff[i,pca2] * 1.15, labels[i], color='g', ha='center', va='center')
    plt.xlim(-1,1)
    plt.ylim(-1,1)
    plt.xlabel("PC{}".format(pcax))
    plt.ylabel("PC{}".format(pcay))
    plt.grid()

Biplot_PCA_in_Python

Plotted using

biplot(score,pca.components_,1,2,labels=categories)

What is Biplot?

Biplot is one of the most useful and versatile methods of multivariate data visualisation. The bipolar extends the idea of a simple scatter plot of two variables to the case of many variables, with the objective of visualising the maximum possible information in the data.

From wikipedia

A biplot allows information on both samples and variables of a data matrix to be displayed graphically. Samples are displayed as points while variables are displayed either as vectors, linear axes or nonlinear trajectories.

If you would like to dig deeper, here’s a link on a comprehensive introduction to Biplots [PDF].

7 responses to “Biplot with Python”

  1. […] figure out a way use customerized functions to plot, like solutions 1, you can click link here: LINK, after tweak a little bit, it works for me, but as the plot doesnt show the color of each […]

    Like

  2. […] sent a recent email complaining  the previous biplot code not working. Though I was not able to replicate his errors,but from the error message figured […]

    Like

  3. What is `score`?

    Like

  4. […] over the scores, and the result is shown in Figure 9. This solution is based on the one proposed at https://sukhbinder.wordpress.com/2015/08/05/biplot-with-python; it probably is not the best way, but it […]

    Like

  5. I think there is a mistake in components references ( like coeff [ i , pca1 ] ), since you want to iterate over the variables but you do it over the components. It should be coeff [ pca1, i ]
    Please refer to sklearn documentation ( http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html ) -> components_ : array, shape (n_components, n_features)

    Like

    1. Thanks Martin. I will look into it. Kind regards

      Like

Leave a comment

Trending