Gotchas in Python For Matlab Users

Matlab to Python“What an odd place to put these comments!” This was my first thought when I saw this. With goal of Apprentice to Guru in mind, I was browsing through some python code and came upon this python file by Roland Memisevic.

Lot of useful information for anyone moving to python from matlab.


No switch statement in python!

Annoying things in python. The difference between matrix and array behavior. Try 
a = np.asmatrix(randn(100, 1))
b = a.T*a
b.shape

a = np.asarray(randn(100, 1))
b = a.T*a
b.shape

# explanation ... * changes behavior between the two. First it is matrix multiply. For array it isn't.

We can be more explicit. dot() gives matrix multiplaction for arrays.

a = np.asmatrix(randn(100, 1))
b = dot(a.T, a)
b.shape

a = np.asarray(randn(100, 1))
b = dot(a.T, a)
b.shape

What happens with a.T*a in the array case? We can force array behavior with multiply.

a = np.asmatrix(randn(100, 1))
b = multiply(a.T, a)
b.shape

a = np.asarray(randn(100, 1))
b = multiply(a.T, a)
b.shape

This means .* in MATLAB, but it has the added useful/confusing behavior that it automatically tiles to form the multiplication.

Consider this MATLAB construct.

a = exp(randn(10, 400))
suma = sum(a, 1)
b = a./repmat(suma, 10, 1)
size(b)
Note the repmat in there. Instead in python this can be done with:

a = np.exp(randn(10, 400))
b = a/a.sum(0)
b.shape

Here, sum is summing over the first dimension (python indexes start from 0 in python) and automatically doing the repmat (tiling) for us! Neat eh? This also works with matrices, 

a = np.asmatrix(np.exp(randn(10, 400)))
b = a/a.sum(0)
b.shape

Of course we should use things in design matrix format, so we have

a = np.asmatrix(np.exp(randn(400, 10)))
b = a/a.sum(1)
b.shape

And finally, let's just check that works with arrays ...

a = np.asarray(np.exp(randn(400, 10)))
b = a/a.sum(1)
b.shape

It doesn't work ... the problem is that the result of the sum in array is a one dimensional array and you can't do the automatic repmat!!

These behaviours are nasty because your code will work/fail simply dependent on whether someone has fed you an array or a matrix.

The repmat automatic tiling can also be a pain ... as it happens automaticaly, and can hide dimension errors.




Other Gotchas
=============

 a = [1 2 3 4; 5 6 7 8];
 reshape(a, 4, 2)


 a = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
 a.reshape(4, 2)
 print a

The problem is because python follows C. Row-major order is used in C and Python; column-major order is used in Fortran and Matlab.

Fix is to use

a.reshape(4, 2, 'F')

Same issues apply to "flatten".

a = [1 2; 3 4]
a(:)'


np.array([[1, 2], [3, 4]]).flatten()

Instead you have to use

np.array([[1, 2], [3, 4]]).flatten(1).T

Again the issue of arrays changing dimension rears its head here.

a = ones(1, 10)
b = ones(10, 10)
c = [b(:)' a]

a = np.ones((1, 10))
b = np.ones((10, 10))
c= r_[b.flatten(1).T, a]


Zeros and randn different behavior

# Python
np.zeros(1, 10)
np.zeros((1, 10))
np.zeros(10)

np.random.randn((1, 10))
np.random.randn(1, 10)
np.random.randn(10)

% MATLAB
randn(10)


Indexing
--------

Similar to matlab, but ranges in python stop before the highest number:

a = [1 2 3 4];
a(1:3)

a = np.array([1, 2, 3, 4])
print a[0:3]

Also beware that the step parameter comes at the end in numpy.

a = 1:10:200

a = r_[1:200:10]

The end value in matlab is replaced with -1. Any -ve number is considered to be indexing from the end, i.e. -2 is end-1, -3 is end-2 etc. Although it will stop before that end number ... need to use [0:] to go to end ...

To reverse the indexing of an array 

a(end:-1:1)

becomes

a[::-1]

Beware the difference between


a = [1 2; 3 4]
a(1) = 0.0
a

and

a = np.array([[1, 2], [3, 4]])
a[0] = 0.0
print a

This can catch you out if the array with

a = np.random.randn(1, 40)
p a[0]

It is particularly confusing as for one dimensional arrays it works fine ... but the problems start if you start by saying a = zeros(18) vs a = zeros(1, 18)

np.asarray(randn(100, 1)).sum(0)

np.asarray(randn(100, 1)).sum()


Editing
=======

After editing modules you need to reload the module.

Plotting
========


plot(plotvals, y, 'k-', 'linewidth', 2) becomes
pp.plot(plotvals.T, y.T, 'k-', linewidth=2)

Matplotlib seems to accept only arrays not matrices!

cov
===

The cov command assumes things are the wrong way around.

cov(randn(100, 2))

np.cov(np.randn(100, 2))

use np.cov(np.randn(100, 2), rowvar=0)


Rank in MATLAB and Python
=========================
In MATLAB rank estimates the rank of a matrix through svd

rank([0, 1, 2, 3; 0, 2, 4, 6; 3, 8, 2, 3; 4, 2, 1, 5])

equivalent to 

A = [0, 1, 2, 3; 0, 2, 4, 6; 3, 8, 2, 3; 4, 2, 1, 5]
s = svd(A)
tol = max(diag(A))*eps(max(s))
r = sum(s > tol)


in python it gives the dimension

np.rank([[0, 1, 2, 3],[0, 2, 4, 6],[3, 8, 2, 3],[ 4, 2, 1, 5]])

LAMBDA
======

lambda is a keyword in python


Tile and Repmat
===============
a = [1; 2; 3; 4]
size(a)
repmat(a, [2, 3])
size(a)

a = np.array([[1], [2], [3], [4]])
a.shape
np.tile(a, (2, 3))

Mgrid
=====

Returns arguments in a different order from meshgrid.
[X, Y] = meshgrid(0:3, 0:4)
Y, X = mgrid[0:4,0:5]

Bizarre behaviour (bug?)
========================

a = [1; 2; 3; 4]
size(a)
b = repmat(a, [1, 1, 2])
size(b)

a = np.array([[1], [2], [3], [4]])
a.shape
b = np.tile(a, (1, 1, 2))
b.shape

A fix I found was to do

b = np.tile(a, (a.shape[0], a.shape[1], 2)) 

for this case.


Sum

sum(np.random.randn(100, 3), 1).shape
np.sum(np.random.randn(100, 3), 1).shape

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s