A Short Primer On “extra_requires“ in setup.py

To include optional installation capabilities in your Python module’s setup.py file, you can use the extras_require parameter. The extras_require parameter allows you to define groups of optional dependencies that users can install by specifying an extra name when running pip install.

Here’s an example setup.py file that includes an optional dependency group for running tests:

from setuptools import setup, find_packages

    description='My awesome module',
        # Required dependencies go here
        'test': [
            # Optional dependencies for testing go here

In this example, the install_requires parameter lists the required dependencies for your module, which are required for installation regardless of which optional dependency groups are installed.

The extras_require parameter defines an optional dependency group called test, which includes the pytest and coverage packages. Users can install these packages by running pip install mymodule[test].

One can define multiple optional dependency groups by adding additional keys to the extras_require dictionary.

Using optional dependencies with the extras_require parameter in your Python module’s setup.py file has several advantages:

  • It allows users to install only the dependencies they need: By defining optional dependency groups, users can choose which additional dependencies to install based on their needs. This can help to reduce the amount of disk space used and minimize potential conflicts between packages.
  • It makes your module more flexible: By offering optional dependency groups, your module becomes more flexible and can be used in a wider range of contexts. Users can customize their installation to fit their needs, which can improve the overall user experience.
  • It simplifies dependency management: By clearly defining which dependencies are required and which are optional, you can simplify dependency management for your module. This can make it easier for users to understand what they need to install and help to prevent dependency-related issues.
  • It can improve module performance: By offering optional dependencies, you can optimize your module’s performance for different use cases. For example, you can include additional packages for visualization or data processing that are only needed in certain scenarios. This can help to improve performance and reduce memory usage for users who don’t need these features.



They Risked What They Did Have and Did Need to What Didn’t Need and Didn’t Have

Warren Buffet, a renowned investor and business magnate, highlighted the fascinating story of the collapse of Long Term Capital Management (LTCM), a firm managed by a group of highly intelligent individuals with vast experience in the field.

Despite their exceptional intelligence and expertise, the firm’s excess leverage misfired in the opposite direction, leading to its downfall. Buffet raises the question of why smart people sometimes make foolish decisions when it comes to financial risk-taking.

He argues that risking something that is important to you for something unimportant is just plain foolish, no matter how favorable the odds may seem.

You should read about the story of Long Term Capital Management. The firm was run by a bunch of ethical super smart guys with very high IQ. The company went belly up because their excess leverage misfired in the opposite direction.” If you take John Meriwether, Eric Rosenfeld, Larry Hilibrand, Greg Hawkins, Victor Haghani and the Nobel prize winner Myron Scholes.

If you take the 16 of them, they probably have the highest average IQ of any 16 people working together in one business in the country, including Microsoft or whoever you want to name so incredible is the amount of intellect in that room. Now if you combine that with the fact that those 16 have had extensive experience in the field in which they operate.

I mean, this is not a bunch of guys who made their money selling men’s clothing and all of the sudden went to the security business or anything. They had, in aggregate, probably 350 or 400 years of experience doing exactly what they were doing. And then you throw in the third factor: that most of them had virtually all of their very substantial net worth in the business. They have their own money tied up, hundreds of hundred of millions of dollars of their own money tied up, a super high intellect, they were working in a field they knew, and they went broke. And that to me is absolutely fascinating. If I write a book, it’s going to be called “Why do smart people do dumb things?

To make the money they didn’t have and they didn’t need, they risked what they did have and did need that’s foolish, that’s just plain foolish. If you risk something that is important to you for something that is unimportant to you, it just does not make any sense. I don’t care whether the odds are 100 to 1 that you succeed, or 1,000 to 1 that you succeed.

If you hand me a gun with a thousand chambers or a million chambers, and there is a bullet in one chamber and you said ‘put it to your temple and pull it’, I’m not going to pull it. You can name any sum you want. It doesn’t do anything for me on the upside, and I think the downsize is fairly clear.

I’m not interested in that kind of a game, and yet people do it financially without thinking about it very much. It’s like Henry Kauffman said the other day– the people going broke in these situations are just two types: the ones who know nothing, and the ones who know everything.

via https://www.youtube.com/watch?v=Oc4WMUB8ljQ

Watch the entire video here…

Related Posts

Reboot Raspberry Pi Every Time on Schedule

Are you tired of encountering memory corruption errors on your Raspberry Pi due to the system running continuously for extended periods? Don’t worry; the solution is here! By using the Linux cron system, you can schedule your Raspberry Pi to reboot at set intervals, such as once a week or every day.

What is the cron system, you may ask? Cron is a time-based job scheduler in Unix-like operating systems. It enables users to schedule jobs, commands, or scripts to run automatically at specified times or intervals. Cron uses a crontab file to schedule tasks. A crontab file is a simple text file that contains a list of commands meant to be run at specified times.

So how do you use cron to schedule a reboot of your Raspberry Pi? It’s pretty straightforward. First, open a terminal on your Raspberry Pi and type the following command:

crontab -e

This command will open the crontab file in the default editor. If you’ve never edited your crontab file before, you’ll be prompted to choose your preferred editor.

Once the crontab file is open, you can add a new line that specifies when you want your Raspberry Pi to reboot. For example, to reboot your Raspberry Pi every day at 2:03 pm, you can add the following line:

03 14 * * * /sbin/shutdown -r now

The line above consists of five fields separated by spaces. The first two fields represent the minute and hour when you want the command to run. The third field represents the day of the month, the fourth represents the month, and the fifth represents the day of the week. In the example above, the “*” symbol means that the command should run every day of the month and every day of the week.

The final part of the line specifies the command to run, which is “/sbin/shutdown -r now” in this case. This command will reboot your Raspberry Pi immediately.

Once you’ve added the line to your crontab file, save and exit the file. Cron will automatically run the command at the specified intervals, and your Raspberry Pi will reboot accordingly.

If you’re experiencing memory corruption or other errors on your Raspberry Pi due to running continuously for extended periods, you can use the Linux cron system to schedule regular reboots.

By adding this simple line to my crontab file, I have scheduled my Raspberry Pi to reboot automatically, ensuring it runs smoothly and efficiently.

Monitor Your Raspberry Pi with Flask: Free Disk Space and Latest File

Are you tired of manually checking your Raspberry Pi’s disk space and latest files? With a few lines of Python code and the Flask web framework, you can create a simple application that monitors your Raspberry Pi for you.

In this post, we will walk through the code that monitors the free disk space on your Raspberry Pi and returns the latest modified file in a specified folder.

To get started, we need to install the Flask and psutil libraries.

Once you have the dependencies installed, create a new Python file and copy the following code:

from flask import Flask
import os
import psutil

app = Flask(__name__)

def disk_space():
    disk = psutil.disk_usage("/")
    free = disk.free // (1024 * 1024)
    return str(free) + " MB"

def get_recent_file():
    folder = "/home/pi/Documents"
    files = os.listdir(folder)
    files.sort(key=lambda x: os.path.getmtime(os.path.join(folder, x)))
    recent_file = files[-1]
    return recent_file

if __name__ == '__main__':

Let’s break down this code.

First, we import the Flask, os, and psutil libraries. Flask is the web framework that we will use to create the application. The os library provides a way to interact with the Raspberry Pi’s file system. Psutil is a cross-platform library for retrieving system information.

Next, we create a new Flask application instance and define two routes: /disk-space and /file.

The /disk-space route uses the psutil library to obtain the amount of free disk space on the Raspberry Pi’s root file system (“/”). The value is converted to megabytes and returned as a string.

The /file route lists all files in the specified folder (in this case, the “Documents” folder in the Raspberry Pi user’s home directory) and returns the name of the most recently modified file. The files are sorted based on their modification time using os.path.getmtime.

Finally, we start the Flask application on port 8989.

To run this application on your Raspberry Pi, save the code to a file (e.g. app.py) and run the following command:

python app.py

This will start the Flask application, and you can access the routes by visiting http://:8989/disk-space and http://:8989/file in your web browser.

That’s it! With just a few lines of code, you can now monitor your Raspberry Pi’s free disk space and latest files.

You can easily modify this code to add more routes and functionality to suit your needs. Happy coding!

Related Posts

Don’t Let Missing Packages Derail Your Python Testing: How to Keep Moving Forward

Have you ever encountered a scenario where you were testing a Python app in a new environment, only to find that there was a missing package that you couldn’t install without help from IT? Recently, I ran into this issue myself and wanted to share my experience.

When faced with a missing package that can only be installed by IT, you may still want to migrate and run pytest on ported code. However, you may find that pytest fails during the collections stage, preventing you from proceeding with your testing. This is certainly not ideal, but fortunately, there’s an easy solution to this problem.

To avoid aborting test collection on import errors (or any other errors), simply use the “–continue-on-collection-errors” flag. This flag tells pytest to continue running even if it encounters errors during the collection phase.

Additionally, you can skip tests on a missing import by using “pytest.importorskip” at the module level, within a test, or test setup function. For example, you can skip a test that relies on a missing “docutils” package with the following line of code:

docutils = pytest.importorskip("docutils")

This allows you to skip tests that depend on a package that cannot be installed or imported, without affecting the rest of your test suite.

In conclusion, encountering a missing package during testing can be frustrating, but it doesn’t have to derail your progress. By using the right flags and skipping tests as necessary, you can continue testing your Python app in a new environment even if there are missing dependencies.

Read more

Travel the World from Your Desktop: How to Use Python to Switch Up Your Wallpaper

Are you tired of staring at the same old desktop background on your Windows laptop every day? Do you have a collection of beautiful travel pictures that you’d love to see on your screen instead? If you answered yes to both of these questions, then you’re in luck! In this post, I’ll show you how to create a Python script that changes your desktop background every 15 minutes using your favorite travel photos. I have done the same for my office laptop.

First, create a new folder on your laptop called “pics” and add your favorite travel pictures to it. You can use images from your own travels or download high-quality images from website of your choice.

Next, let’s create the Python script that will change your desktop background. Open up your favorite text editor and create a new file called “change_background.py”. Then, copy and paste the following code:

import ctypes
import os
from random import choice
import sched
import time

event_schedule = sched.scheduler(time.time, time.sleep)


FILES = [os.path.join(FOLDER, f) for f in os.listdir(FOLDER)]

def change_wallpaper():
    event_schedule.enter(choice([13,23,7,11,5])*60, 1, change_wallpaper, ())

if __name__ == "__main__":
    event_schedule.enter(10, 1, change_wallpaper, ())

Make sure to replace “FOLDER” with your actual path of your images.

This code uses the “ctypes” module to call the “SystemParametersInfoW” function from the Windows API, which changes the desktop background to a random image from the “pics” folder. The script then waits for random minutes before changing the background again.

Save the file and make sure it is in a directory that you can easily access. Now, let’s schedule the script to run automatically every time you start your laptop.

Open up the Windows Task Scheduler by searching for it in the Start menu. Click on “Create Basic Task” and follow the prompts to set up a new task.

When prompted to choose a program/script, browse to the location of your “change_background.py” file and select it. Set the trigger to “At Startup” and click “Finish” to complete the setup.

Now, every time you start your laptop, your Python script will automatically run in the background and change your desktop background every 15 minutes.

In conclusion, with just a few lines of Python code and the Windows Task Scheduler, you can turn your boring desktop background into a slideshow of your favorite travel photos. Give it a try and let me know how it goes!

Related More

Getting Battery Percentage in Windows with Python

Battery percentage is an important aspect of mobile devices, laptops, and other battery-powered electronic devices. It tells us how much energy the battery has , which is crucial in determining how long the device will last before needing to be recharged.

In this blog post, we will see how to get battery percent information in Windows using Python.

Using the psutil Library

The psutil library is a comprehensive library for retrieving information about the system and processes running on it. It provides a simple and straightforward way to access the battery percent information in Windows.

Here is an example code that demonstrates how to use psutil to get the battery percent information in Windows:

import psutil

battery = psutil.sensors_battery()
print("Battery Capacity:", battery.percent)

In the code above, we first import the psutil library. Then, we use the sensors_battery() function to get the battery capacity information.

This function returns a sensors_battery object, which contains several properties that provide information about the battery, such as the percent, power plugged, and others. In the above example, we print the percent of the battery.

Related Posts:

How to Enable CORS in Django

My Django learning app deployed on raspberrypi for kids was functioning smoothly when accessed from a home network. However, when we had to travel to Kolkata, and the app was promoted to a PythonAnywhere server. This move brought about a new challenge, as the app started to face issues with Cross-Origin Resource Sharing (CORS).

I soon realized that this was a common issue and could be easily resolved by enabling CORS in the Django app.

I followed the following simple process and soon had the app up and running smoothly again, with CORS enabled. The end.

Here’s how to enable CORS in Django:

  • Install the Django-Cors-Headers package by running python -m pip install Django-cors-headers
  • Add it to the INSTALLED_APPS list in your Django settings:
INSTALLED_APPS = [    ...    'corsheaders',    ...]
  • Add the CorsMiddleware class to the MIDDLEWARE list:
MIDDLEWARE = [    ...,    'corsheaders.middleware.CorsMiddleware',    'django.middleware.common.CommonMiddleware',    ...,]
  • Configure the CORS headers by setting the following variables in your Django settings:

Note: CORS_ALLOW_ALL_ORIGINS set to True allows all origins, while CORS_ALLOWED_ORIGINS only allows specific origins listed in the list.

And so, the journey with the Django app continued without any further hiccups and kids are still using the same for their spaced revision and review.


As an HR initiative, every Monday we receive an email about an inspiring story. While some of the stories are great, others are just okay, and some have become so repetitive that they have lost their inspirational luster.

Today, I decided to try an experiment, and this is the result of that experiment.

In ancient India, there lived a young couple named Rohit and Radhika. They were deeply in love and were considered a perfect match by all who knew them. However, one day, Radhika began to doubt Rohit’s love and accused him of being unfaithful. Rohit was heartbroken and tried his best to prove his love, but Radhika remained suspicious.

One day, Lord Vishnu appeared before the couple in the guise of an old sage. He saw their distress and offered to help. Rohit and Radhika agreed, and the sage shared with them a story from Hindu mythology about the importance of trust in a relationship.

The sage told the story of a mighty king named Raja Janaka, who had everything a man could ever want – wealth, power, and a loving wife. Despite all of this, the king was not happy. One day, Lord Vishnu appeared before him and asked why he was so unhappy. The king confessed that he had lost the trust of his wife, and that his marriage was on the verge of collapse.

Lord Vishnu then told the king a parable about two pots of gold. One pot was placed in a safe, while the other was kept in an open field. The king was asked to guess which pot of gold would be safe from theft. The king replied that the pot in the safe would be safe, as it was protected by strong walls and a lock.

Lord Vishnu then revealed that the pot in the open field was actually safe. The king was confused and asked how that was possible. Lord Vishnu explained that the pot in the open field was guarded by the trust of the villagers, and that no one would dare to steal from it.

The sage then explained to Rohit and Radhika that just like the pot of gold, a relationship needs trust to be safe and secure. Without trust, a relationship will be like the pot in the safe – vulnerable to theft and destruction. But with trust, a relationship will be like the pot in the open field – guarded and protected.

Rohit and Radhika listened to the sage’s words and realized that their relationship was like the pot in the safe. They made a vow to each other to nurture their relationship by planting the seeds of trust and watering it with love and understanding.

– an experiment

I won’t disclose the details of the experiment in this post, but I promise to reveal it in a future post. If you have already figured out what the experiment was, please share your thoughts in the comments section below.

Handling Multiple Inputs with argparse in Python Scripts

argparse demo for multiple inputs

The problem.

ffmpeg allows multiple inputs to be specified using the same keyword, like this:

ffmpeg -i input1.mp4 -i input2.webm -i input3.mp4

Let’s say you are trying to write a script in python that accepts multiple input sources and does something with each one, as follows:

python_script -i input1.mp4 -i input2.webm -I input3.mp4

How do we do this in argparse?

Using argparse, you are facing an issue as each option flag can only be used once. You know how to associate multiple arguments with a single option (using nargs=’*’ or nargs=’+’), but that still won’t allow you to use the -i flag multiple times.

How can this be accomplished?

Here’s a sample code to accomplish what you need using argparse library

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('-i', '--input', action='append', type=str, help='input file name')

args = parser.parse_args()
inputs = args.input

# Process each input
for input in inputs:
    # Do something with the input
    print(f'Processing input: {input}')

With this code, the input can be passed as:

python_script.py -i input1.mp4 -i input2.webm -i input3.mp4

The key in the whole program is the phase “append” in the action keyword.

Hope this helps.

Learn more

If You’re Not in the Game, You Can’t Hit a Home Run

New Year resolutions often face skeptical responses from friends, with the idea dismissed as ineffective. However, I still believe in their value.

Although many of my resolutions fail, some stick and bring positive changes to my life, such as a daily routine of waking up early, reading a chapter of a book, and doing 30 pushups.

This supports the sentiment expressed in the book “How to Change” by Angela Duckworth and Katy Milkman, which highlights the science of achieving desired outcomes.

“Of course, I understand where they’re coming from. I’ve been frustrated with failed resolutions in the past, too, and I’m committed to teaching more people about the science that can help them succeed.

But this question still drives me a little crazy. As actor David Hasselhoff has said, “If you’re not in the game, you can’t hit a home run.”

In my opinion, New Year’s resolutions are great! So are spring resolutions, birthday resolutions, and Monday resolutions. Any time you make a resolution, you’re putting yourself in the game.

Too often, a sense that change is difficult, and daunting prevents us from taking the leap to try. Maybe you like the idea of making a change, but actually doing it seems hard, and so you feel unmotivated to start. Maybe you’ve failed when you attempted to change before and expect to fail again. Often, change takes multiple attempts to stick.

I like to remind cynics that if you flip the discouraging statistics about New Year’s resolutions on their head, you’ll see that 20 percent of the goals set each January succeed. That’s a lot of people who’ve changed their lives for the better simply because they resolved to try in the first place.

Just think of Ray Zahab, transforming himself from an unhappy, out-of-shape smoker to a world-class athlete.

For some people, fresh starts can help prompt small changes. But they can also inspire transformative change by giving you the will to try pursuing a daunting goal.”

How to Change” by Angela Duckworth and Katy Milkman

Read More

Json tool hidden in plain sight

I have been using the json module for as long as I can remember. It’s part of the standard library and is part of my daily work, so it was a pleasant surprise when I learnt about the hidden json.tool.

import json
python -m json.tool 
C:\Users\sukhbinder.singh>python -m json.tool -h
usage: python -m json.tool [-h] [--sort-keys] [--json-lines] [infile] [outfile]

A simple command line interface for json module to validate and pretty-print JSON objects.

positional arguments:
  infile        a JSON file to be validated or pretty-printed
  outfile       write the output of infile to outfile

optional arguments:
  -h, --help    show this help message and exit
  --sort-keys   sort the output of dictionaries alphabetically by key
  --json-lines  parse input using the jsonlines format

Example usagse

curl -s https://arshstudy.pythonanywhere.com/spelling/subject/4 | python -m json.tool



        “id”: 301,

        “question”: “India is an ___ land”,

        “answer”: “ancient”,

        “inum”: 2,

        “due_date”: “2021-04-05T06:23:12+05:30”,

        “active”: false,

        “chapter”: 0,

        “isvoiceonly”: false,

        “subject”: 4



        “id”: 302,

        “question”: “As of May 2008 there are ___ officially recognised or scheduled languages in India”,

        “answer”: “22”,

        “inum”: 2,

        “due_date”: “2021-04-05T06:23:17+05:30”,

        “active”: false,

        “chapter”: 0,

        “isvoiceonly”: false,

        “subject”: 4



        “id”: 303,

        “question”: “___ is chosen as the official language of the government of India”,

        “answer”: “hindi”,

        “inum”: 3,

        “due_date”: “2021-04-24T12:39:27+05:30”,

        “active”: false,

        “chapter”: 0,

        “isvoiceonly”: false,

        “subject”: 4



I love this. I was always using VScode to preety print my json files but with this find all this and more can be done using standard python.

Git: SSL Certificate problem: unable to get local issuer certificate.

Here’s a problem that I ran into a few months ago on a system while cloning a remote git repo.

 $ git clone https://github.com/sukhbinder/winsay.git
Cloning into 'winsay'...
fatal: unable to access 'https://github.com/sukhbinder/winsay.git/': SSL certificate problem: unable to get local issuer certificate

I am using Git on Windows. Installed the msysGit package. Test repository has a self-signed certificate at the server. Can access and use the repository using HTTP without problems. Moving to HTTPS gives the error:

SSL Certificate problem: unable to get local issuer certificate.

I have the self signed certificate installed in the Trusted Root Certification Authorities of my Windows 7 – client machine. I can browse to the HTTPS repository URL in Internet Explorer with no error messages.


Open Git Bash and run the command if you want to completely disable SSL verification.

git config --global http.sslVerify false

Note: This solution opens you to attacks like man-in-the-middle attacks. Therefore, turn on verification again as soon as possible:

git config --global http.sslVerify true

Hope this helps others who get this or similar error. Do let me know.

Other related posts

Something Gota Give

Yes, if you take on a lot of projects and you have the same number of hours something has got to give. And posting on this blog was that something for me.

From September 2022, the posting receded and stopped. Well now things are back in control, and I hope I will be able to give the blog the same consistent attention again.

Lets start from here. Thank you for reading.

Happy New Year. I wish you a healthy wealthy 2023!!

Export PowerPoint Slides with Python

A couple of years ago, I had this issue where I needed to export slides of powerpoint as png. There were a lot of them, so doing them manually was out of question, here’s a quick python script to export powerpoint slides to png.

import sys, win32com.client

class ApplicationEvents(object):
    def OnQuit(self):

spath = r"C:\Users\sukhbinder\Desktop\cool_presentation.pptx"

app = win32com.client.DispatchWithEvents("Powerpoint.Application", ApplicationEvents)
doc.Export(r"C:\Users\sukhbinder\Downloads", "PNG")

Hope this helps someone.

Some related posts

Example of Subparser/Sub-Commands with Argparse

I like argparse. yes there are many other utilities that have and make life easy but I am still a fan of argparse mostly because it’s part of the standard python installation. No other installs needed

Argparse is powerful too, if you have used, git you should have experienced the subcommands. Here’s how one can implement the same with argparse.

def main():

    parser = argparse.ArgumentParser(description="Jotter")
    subparser = parser.add_subparsers()

    log_p = subparser.add_parser("log")
    log_p.add_argument("text", type=str, nargs="*", default=None)

    show_p = subparser.add_parser("show")
    show_p.add_argument("--all", action="store_true")
    show_p.add_argument("--id", type=int, default=0)
    show_p.add_argument("-s", "--skip", type=int, default=0)
    show_p.add_argument("-l","--limit", type=int, default=100)

    search_p = subparser.add_parser("search")
    search_p.add_argument("search", type=str, default=None)
    search_p.add_argument("-limit", type=int, default=100)

    args = parser.parse_args()

In the above code jotter is our main command, it has other subcommands like jotter log, jotter show jotter search.

Have you used this before?

Some related posts

Automating Copying of Files from Raspberry Pi using Python

My Rasberry pi has just a 32GB memory card, so another issue I face with my timelapse automation is regularly copying the files from the raspberry pi to my laptop.

I have tried various options like git, secure copy (SCP), FTP, ssh etc All of them work but have their limitations.

But there is one system that I have finally stuck and works seamlessly. As again its implemented with python and used wget cmd-line tool

Here’s the code that lets me transfer the files from the raspberry pi to my laptop. I just run this on schedule on my mac every week.

from datetime import datetime, timedelta
import os
import subprocess
import argparse

BASE_URL = r"{}"

def get_dir(day=1, outfolder=r"/Users/sukhbindersingh/pyimages"):
    if day > 0:
        day = day * -1
    now =datetime.now()
    yesterday = now+timedelta(days=day)
    datestr = yesterday.strftime("%m_%d_%Y_")
    fname = "v_{}_overval.mp4".format(datestr)
    fname_src = BASE_URL.format(fname)
    cmdline = "wget {}".format(fname_src)
    print("downloading {}".format(fname_src))
    iret = subprocess.call(cmdline.split())
    return iret

parser = argparse.ArgumentParser("download_video", description="Download raspberry pi videos")
parser.add_argument("-d", "--days",type=int,  help="No of backdays to download", default=1)
parser.add_argument("-o", "--outdir", type=str, help="Output dir where downloaded file will be kept", default=None)

args = parser.parse_args()

outfolder = args.outdir
if outfolder is None:
    outfolder = r"/Users/sukhbindersingh/pyimages"

for day in range(args.days):
    iret = get_dir(day+1, outfolder)

How will you solve this issue? Do you have another way that this can be solved? Do let me know in the comments.

Read related posts

Principal Component Analysis in pure Numpy

In 2009 I was working with principal component analysis PCA in my job. It was my first introduction to this topic, so I played with it in the office and at home in my spare time.

Python was my favourite play tool at that time. Stumbled upon this code that I wrote in 2013 as part of a personal project.

In case you are wondering what is PCA?

Principal component analysis (PCA) is a standard tool in modern data analysis and is used in many diverse fields from computer graphics, machine learning to neuroscience, because it is a simple, non-parametric method for extracting relevant information from enormous and confusing data sets.

With minimal effort PCA provides a map for how to reduce a complex data set to a lower dimension to reveal the sometimes hidden, simplified structures that often underlie it.

Shame I did not have GitHub then, or it would have been posted there, so here it goes.

# -*- coding: utf-8 -*-
Created on Sun Jan 31 11:03:57 2013

@author: Sukhbinder

import numpy as np

def pca1(x):
    """Determine the principal components of a vector of measurements

    Determine the principal components of a vector of measurements
    x should be a M x N numpy array composed of M observations of n variables

    PCA using covariance
    The output is:
    coeffs - the NxN correlation matrix that can be used to transform x into its components
    signals is MxN of projected data

    The code for this function is based on "A Tutorial on Principal Component
    Analysis", Shlens, 2005 http://www.snl.salk.edu/~shlens/pub/notes/pca.pdf
    (M,N)  = x.shape
    Mean   = x.mean(0)
    y      = x - Mean
    cov    = np.dot(y.T,y) / (M-1)
    (V,PC) = np.linalg.eig(cov)
    order  = (-V).argsort()
    coeff  = PC[:,order]
    signals = np.dot(PC.T,y.T)
    return coeff,signals,V

def pca2(x):
    """Determine the principal components of a vector of measurements
    Determine the principal components of a vector of measurements
    x should be a M x N numpy array composed of M observations of n variables
    The output is:
    coeffs - the NxN correlation matrix that can be used to transform x into its components
    signals is MxN of projected data
    The code for this function is based on "A Tutorial on Principal Component
    Analysis", Shlens, 2005 http://www.snl.salk.edu/~shlens/pub/notes/pca.pdf
    (M,N)  = x.shape
    Mean   = x.mean(0)
    y      = x - Mean
    yy = y.T/np.sqrt(M-1)
    u,s,pc = np.linalg.svd(yy)
    v= np.dot(s,s)
    signals = np.dot(pc.T,y)
    return pc,signals,v

scikit-learn etc and other libraries do have PCA so what was the need to write PCA code?

Well, I was trying to understand PCA deeply and I couldn’t use the library sklearn so this piece of code was written completely in numpy which helped me reduce the resolutions of my family pictures back in 2013 before google photos made this redundant. 🙂

Related Posts