Tracking History of Single File in your Git Repo

Today’s tip of the day is on git.

gitk is the battery included, simple GUI that can show you the state of your git at any point in time? It is very convenient and much more intuitive to use and useful than the git log and git reflog CLI commands.

I use it every day. The fact that GITK is shipped with git is a good thing as if you have git, you will have this simple utility.

Typing gitk in your bash or command line will open up this utility.

If you want to see all your branches you can use

gitk --all

This shows all the branches and structures.

But what if you want to see only the history of a single file. what if you want to track and piece through the history of just one file.

Well, there is one less know command-line option to gitk which comes in handy. The “--

Here’s how to use it.

gitk -- single file

See the demo below

let me know if you know any other commands that I don’t know in the comments below?

More tips like this in the links below

Why You Should Reject That Gift?

Here’s a story that resonated with me recently, posting it here.

When Buddha Rejects a Gift

When Buddha was walking through a village teaching, a rude and angry person who belonged to another group of believers walks in. He starts insulting Gautama and says, “You have no right teaching others. You are as stupid as everyone else.” He shouted, “You are nothing but a fake.”

Looking at his anger, Buddha simply gave a gentle smile and asked, “Tell me, if you buy a gift for someone and if that person does not take it, to whom does the gift belong to?”

This question pushed the person to surprise and he answers, “It belongs to me because I bought the gift.” Gautama smiled and said, “That’s right. It is exactly same with your anger and frustration. If you become angry with me and if I do not get insulted then the anger falls back on you. All you have done is hurt yourself.”

This is it. We should know what to reject and what to accept in the life. In this case, Buddha simply rejected the insult and now the gift belongs to a man in the form of frustration, anger, displeasure.

via this

Some related posts you may like.

How to resolve this pandas ValueError: arrays must all be same length

Consider the following code.

import numpy as np
import pandas as pd

in_dict = dict(a=np.random.rand(3), b=np.random.rand(6), c=np.random.rand(2))

df = pd.DataFrame.from_dict(in_dict)

This fails with the following error

df = pd.DataFrame.from_dict(in_dict)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-2c9e8bf1abe9> in <module>
----> 1 df = pd.DataFrame.from_dict(in_dict)

~\Anaconda3\lib\site-packages\pandas\core\frame.py in from_dict(cls, data, orient, dtype, columns)
   1371             raise ValueError("only recognize index or columns for orient")
   1372
-> 1373         return cls(data, index=index, columns=columns, dtype=dtype)
   1374
   1375     def to_numpy(

~\Anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
    527
    528         elif isinstance(data, dict):
--> 529             mgr = init_dict(data, index, columns, dtype=dtype)
    530         elif isinstance(data, ma.MaskedArray):
    531             import numpy.ma.mrecords as mrecords

~\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in init_dict(data, index, columns, dtype)
    285             arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays
    286         ]
--> 287     return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    288
    289

~\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype, verify_integrity)
     78         # figure out the index, if necessary
     79         if index is None:
---> 80             index = extract_index(arrays)
     81         else:
     82             index = ensure_index(index)

~\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in extract_index(data)
    399             lengths = list(set(raw_lengths))
    400             if len(lengths) > 1:
--> 401                 raise ValueError("arrays must all be same length")
    402
    403             if have_dicts:

ValueError: arrays must all be same length

The solution is simple. I have faced this situation a lot, so posting this here on the blog for easy reference

use orient=’index’

df = pd.DataFrame.from_dict(in_dict, orient='index')

df.head()

          0         1         2         3         4         5
a  0.409699  0.098402  0.399315       NaN       NaN       NaN
b  0.879116  0.460574  0.971645  0.147398  0.939485  0.222164
c  0.747605  0.123114       NaN       NaN       NaN       NaN

df.T

          a         b         c
0  0.409699  0.879116  0.747605
1  0.098402  0.460574  0.123114
2  0.399315  0.971645       NaN
3       NaN  0.147398       NaN
4       NaN  0.939485       NaN
5       NaN  0.222164       NaN

Some related posts you might like

Why do Three-Toed Sloths Come Down From Their Trees to Defecate?

Our bodies are most robust and most fragile at the same time. Recently completed the excellent book Evolution gone wrong the curious case why our body fails us? by Alex Bezzerides

Well written and extremely fun to read. Filled with many funny but insightful why questions. Here’s a small sample from the book on pooing sloths?

Why do three-toed sloths come down from their trees to defecate?

On the surface, this behavior is baffling. Why risk the chance of encountering a predator? Why not just let it fly from the branches? In class, my students work together to develop hypotheses and design hypothetical experiments to test their hypotheses. Are sloths fertilizing their trees in a targeted manner? Is it some way of marking their territory? Is it an atypical type of mate attraction?

Acutely observant scientists solved the mystery only recently with a great deal of patience.

They first observed that sloths have algae growing in their fur, which gives the sloths a green tint. The algae help the sloths blend in with the forest canopy, but the story goes beyond organic camouflage.

The sloth scientists noted sloths feeding on their homegrown algae and in doing so, supplementing their otherwise nutrient-poor diet. Eating their own fur algae is admittedly weird, but it gets even stranger than that.

A population of moths lives in the fur of each three-toed sloth. The moth population increases the nitrogen content of the fur and thus promotes the growth of the algae the sloths snack on.

When the sloths make their weekly treks to the bottoms of trees, the female moths lay their eggs in the fresh sloth dung. The tidy sloths cover up their mess with some leaf litter, and after the eggs hatch, the moth caterpillars dine on the sloth poop, grow up, become adults, and fly to the canopy layer to colonize sloths just as their parents did.

Sloths risk their lives to make a dung nursery for the moths on whom they depend for fertilizer to grow the algae they not only use as camo but also eat from their own fur for an extra shot of nutrition. Bam! Mystery solved. We can finally let the sloths poop in peace. Next question.

I hope this sloth-and-moth story has made the point that ultimate questions are fascinating to consider. They push researchers in completely different directions compared with proximate questions. The answers to ultimate questions are also often wildly unexpected.

This is what the book delivers answers to the ultimate questions on human anatomy? Do give it a read if you get a chance?

Do you have other interesting books to recommend, please let me know in the comments below?

Related Post you might like.

Modelling Uncertainty: When will I get delivery of my Car?

Covid 19 has affected everyone. Direct effects of these were felt by everyone and indirect effects will be felt for some time.

One of the second-order effects was the chip shortage and the consequent delay in car manufacturing and deliveries.

Let’s start from the beginning, In December 2021, after sitting on the fence and no longer batting my better half nugs, we booked our car.

The selection, shortlisting and elimination took a few weeks and then after a round of test drives, finally booked KIA Sonet G1.0T HTX iMT.

Being the most value for money (VFM) variant and the best transmission combination variant, this had a waiting of 19-20 weeks.

The first 10 weeks of waiting were easy to pass, looking at youtube videos of accessories, modifications, road trips etc but as we entered the double-digit phase of the waiting weeks, everyone was anxious to know when are we getting our car?

A couple of calls to the CSR only solicited, please wait you are on the 6th person in line. That’s what has happened so far till we entered the month of Feb.

Frustrated with the wait and the uncertainty, there is one thing left that the engineer in me was dying to try. Put some numbers to this uncertainty.

The uncertainty increased as we had a planned vacation in April, so my better half was anxious if there will be a clash in delivery and our travel plans

So turned to Montecarlo to predict what are the chances of getting the car in mid-Feb, end Feb, mid-Mar or beyond.

Here’s how I did it.

import numpy as np
import scipy.stats as st

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

Data Collection

Turned to google and the internet and got the data for last year’s KIA Sonet monthly deliveries in India. This is 2021 data on which we will base our predictions.

sales_data = np.asarray([8859,7997,8498,7724,6627,5963,7675,7752,4454,5443,4719,3578])
plt.bar(range(len(sales_data)),sales_data);
Not an encouraging sign, limited and declining production numbers

We will use a 90% confidence interval for the normal distribution and a rough guesstimate of the number of KIA dealers in India

ci_90 = 3.29

#Rough no of KIA delaers
nodelears = 250

# No of monte carlo simulations
nos = 1000000

Time to Make Some Assumptions

After this data collection, there were other things that I have to estimate

  • Percentage of people opting for the petrol variant. Low and High
  • Percentage of people opting for IMT transmission. Low and High
  • Percentage of people buying the HTX variant. HTX is the most value-for-money variant. Mostly derived this number based on our discussions with the sales person and youtube videos and a cursory look at social media posts
# This is our rough consesus estimate with 90% confidence
# % of people oping for petrol
petrol_low = 0.6
petrol_high = 0.7

# IMT transmission
# % of people buying imt transmission
imt_low = .30
imt_high = .40

# HTX Variant probability 
# % of people buying the Sonet HTX variant
variant_low = 0.40
variant_high = 0.50

Converting all this data collected into normal distributions

petrol = st.norm(loc=(petrol_low+petrol_high)/2, scale=(petrol_high-petrol_low)/ci_90)
imt = st.norm(loc=(imt_high+imt_low)/2, scale=(imt_high-imt_low)/ci_90)
variant = st.norm(loc=(variant_high+variant_low)/2, scale=(variant_high-variant_low)/ci_90)

# Generating the data for 1000000 simulations

petrol_results = petrol.rvs(nos)
imt_result = imt.rvs(nos)
variant_result = variant.rvs(nos)

Now using the historical data and the simulation numbers of Kia Sonet deliveries to predict the next 3 month mean deliveries

sales=np.random.choice(sales_data, (nos,3)).mean(1)
sns.displot(sales, kind="kde");
Encouraging sign or were we too optimistic about our assumptions?
# Storing everything into a dataframe for easy statistics
data = pd.DataFrame({
"sales":sales,
"petrol":petrol_results,
"imt":imt_result,
"variant":variant_result})

print(data.describe())

                sales          petrol             imt         variant
count  1000000.000000  1000000.000000  1000000.000000  1000000.000000
mean      6607.339004        0.650043        0.349980        0.449967
std        962.995620        0.030412        0.030419        0.030376
min       3578.000000        0.507467        0.202519        0.312613
25%       5960.000000        0.629536        0.329489        0.429458
50%       6646.666667        0.650035        0.349975        0.449975
75%       7325.666667        0.670540        0.370531        0.470449
max       8859.000000        0.796902        0.496014        0.588184

Computing final number

Once we have this data, now using this we calculate Nocars that our dealership can get

Nocars is a number of deliveries that a dealership will get for the variant, transmission and engine we are interested in.

data["nocars"] = (data.sales*data.petrol*data.imt*data.variant)/nodelears
print(data.nocars.describe())

count    1000000.000000
mean           2.705510
std            0.512645
min            1.070139
25%            2.344456
50%            2.686584
75%            3.044903
max            5.662677

Inference

As seen in the above column, the dealership will receive less than 3 cars delivered per month 75% of the time. Not looking good for us we were the 6th person in line.

plt.axvline(x= data.nocars.mean(), c='g');
plt.hist(data.nocars, bins=100);
Less than 3 deliveries per month for the dealership for our chosen variant. 😦
pesimistic = data.nocars.quantile(q=0.25)
mean = data.nocars.mean()
optimistic = data.nocars.quantile(q=0.75)

timea = ["feb-beg", "feb-end", "mid-mar", "end-mar", "mid-apr", "end-apr"]
multiplier =[0.5, 0.75, 1, 2,2.8,3]

print("Time", "\t", "Pessimistic", "\t", "Mean", "\t", "Optimistic")
for t,m in zip(timea,multiplier):
    print(t, "\t",round(m*pesimistic, 0),"\t",round(m*mean, 0),"\t",round(m*optimistic, 0))

Time 	 Pessimistic 	 Mean 	 Optimistic
feb-beg 	 1.0 	 1.0 	 2.0
feb-end 	 2.0 	 2.0 	 2.0
mid-mar 	 2.0 	 3.0 	 3.0
end-mar 	 5.0 	 5.0 	 6.0
mid-apr 	 7.0 	 8.0 	 9.0
end-apr 	 7.0 	 8.0 	 9.0

Probability

Probability of number of Petrol Sonets IMT transmission HTX variants the dealership can get in a given month, based on past data

for i in range(1,6):
    print(i, round(data[data.nocars>i]["nocars"].count()/nos*100,2), "%" )

1 100.0 %
2 92.0 %
3 27.69 %
4 0.85 %
5 0.0 %

Bottom line, very slim chance of getting the car by end of March, most probable date was the end of April, which came to the promised delivery date.

What has this exercise taught me?

A lot.

Putting a number to that uncertainty was a huge deal for me and with this exercise, I dispelled all hope that I will get the car in March as verbally promised by the dealership. Also, these calculations gave me a little insight into how Covid continues to affect us way beyond the initial days.


I have used Montecarlo in my work for modelling material, geometry, and BC uncertainties in gas turbine engines but this was the first time I tried using it on something so close to my own life and circumstances.

It was a fun exercise.

Update:

Someone doing this analysis today can use the data provided by the company

According to the company, 25% of Sonet buyers chose the iMT variants, while 22% opted for the automatic transmission. 26% of customers prefer the top variants, while diesel variants accounted for 41% of the overall Sonet sales.

According to Kia, the two most popular colours for the Sonet are Glacier White Pearl and Aurora Black Pearl. These account for 44% of the overall dispatches.

source

Why Many Doctors Recommend a High-Fiber Diet?

This might be simplistic, but a good explanation of why you should eat more fiber?

Whatever our small intestine does, it always obeys one basic rule: onward, ever onward!

This is achieved by the peristaltic reflex. The man who first discovered this mechanism did so by isolating a piece of gut and blowing air into it through a small tube and the friendly gut blew right back.

This is why many doctors recommend a high-fiber diet to encourage digestion: indigestible fiber presses against the gut wall, which becomes intrigued and presses back.

These gut gymnastics speed up the passage of food through the system and make sure the gut remains supple.

From the excellent book Gut by Giulia Enders

Related Posts

Fix Bloated Git Repo With these Commands

If you are using git and eventually your git repo will gather dust, will bloat. I use git to manage my desktop and overtime this particular git repo becomes bloated, so these are two commands that I come back to every quarter or so to keep things tidy.

git fsck

Verifies the connectivity and validity of the objects in the database.

git-fsck tests SHA-1 and general object sanity, and it does full tracking of the resulting reachability and everything else. It prints out any corruption it finds (missing or bad objects).

git-gc

Cleanup unnecessary files and optimize the local repository

Runs a number of housekeeping tasks within the current repository, such as compressing file revisions (to reduce disk space and increase performance), removing unreachable objects which may have been created from prior invocations of git add, and packing refs, pruning reflog, metadata or stale working trees.

Am I doing something wrong? Is there a better way, please let me know in the comments?

Related posts you might be interested in

20 Slots

Here’s a video of Warren Buffett talking about choices. Applicable to investing as well in all other areas of life.

I keep coming back to this, timeless advice.

Similar post you might like

True power is ….

Here’s a quote from Warren Buffets that’s on my mind.

You will continue to suffer if you have an emotional reaction to everything that is said to you.

True power is sitting back and observing things with magic.

True power is restraint.

If words control you that means everyone else can control you.”

-Warren Buffett

Some related posts you may like.

Put an Image Behind your matplotlib plots

Here’s a quick one.

Problem.

You want to add pretty graphics in the back of your data. How to do this with matplotlib?

Solution

import numpy as np
import matplotlib.pyplot as plt

# Path to the image
fpath =r"C:\Users\sukhbinder.singh\Desktop\day.jpg"

# Read the image
img = plt.imread(fpath)

# Plot the image
fig, ax = plt.subplots()
ax.imshow(img)
a, b, c,d = plt.axis('off')

# Now plot your appropriatly scalled data. We will plot some 
# random numbers
xx = np.random.randint(a,b, size=100)
yy = np.random.randint(d,c, size=100)
plt.plot(xx,yy, "r.")
plt.savefig("wall.png")
plt.show()

Simple. Here’s the result

Image as Background in a Matplotlib Plot

Some Useful pytest Command-line Options

I love pytest.

Pytest is a testing framework which allows us to write test codes using functional python and functional python is awesome.

Why use PyTest?

There are many reasons to use pytest here are some that I feel are important.

  • Very easy to start with because of its simple and easy syntax.
  • Less Boilerplate
  • Can run a specific test or a subset of tests
  • and many more useful features

Here’s a list of command-line options that can be used while using pytest.

Simple use
pytest

Too unorganised, lets’ fix this

pytest -v

Much better.

oh there’s a failure but there is too much information on the failure, let’s fix that with

pytest -v –tb=line

This is good, but just a line of info is too little. Ok lets try this.

pytest -v –tb=short

Thats good.

What if I want to run a specific test. No problem just use “-k” option

pytest -v -k “SOMENAME”

Thats cool. What if I want to just run the last failed test or tests. Simple use “–lf

pytest -v –lf

And if you want to debug the failed tests, well use “–pdb

pytest -v –pdb

On failure, it will bring the debugger.

Well, that’s it for this post. Hope this helps.

More posts like this that you might want to explore.

The Man Who Boiled Urine to Get Gold.

Ever since we moved our dinner out of our TV room, dinner time has been a constant source of enjoyment. Sometimes kids tell their stories and sometimes I tell them stories that I have read from the recent books I have been reading.

Last month told an interesting story to kids from the Book Elemental by Tim James. I was hoping to post it here on this blog but my Son beat me to it. He likes these stories and wastes no time in sharing them if they are interesting on his blog. Do read this. You will love the story.

The man who boiled urine for gold.
Click on the image to read.

Its a story of a late seventh-century German experimenter named Henning Brandt who proved everyday substances had elements locked inside them and most of the stuff we thought pure was not so.

Do give it a read. The Man Who Boiled Urine.

Related posts:

Conda PackagesNotFoundError: The following packages are not available

This is a problem that had annoyed me a lot because conda environment file had specific versions specified and we were not allowed to change them to maintain compatibility of python code across different users and geographic.

Problem

conda install numpy=1.11.3 mkl=11.3.3 matplotlib=1.5.3 psutil=5.4.7 numexpr=2.6.1 h5py=2.6.0 hdf5=1.8.15.1 pandas=0.23.3 pytables=3.2.2 python=3.5.2 python-dateutil=2.5.3 setuptools=27.2.0 vc=14.1 vs2015_runtime zlib=1.2.11 bzip2=1.0.6 scipy=0.18.1


(py35) C:\Users\Sukhbinder>conda install numpy=1.11.3 mkl=11.3.3 matplotlib=1.5.3 psutil=5.4.7 numexpr=2.6.1 h5py=2.6.0 hdf5=1.8.15.1 pandas=0.23.3 pytables=3.2.2 python=3.5.2 python-dateutil=2.5.3 setuptools=27.2.0 vc=14.1 vs2015_runtime=15.5.2 zlib=1.2.11 bzip2=1.0.6
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

  - vs2015_runtime=15.5.2

Current channels:

  - https://conda.anaconda.org/conda-forge/win-64
  - https://conda.anaconda.org/conda-forge/noarch

SOLUTION

The solution is surprisingly simple.

 conda config --set restore_free_channel true

Some Related posts that you may like

Starting and Finishing

This is a repost from the excellent Seth Godin’s blog. I keep coming back to this post.

Sometimes the rule is:

You don’t have to finish, but you do have to start.

And sometimes the rule is:

You don’t have to start, but if you do, you have to finish.

When building a personal habit, it might make sense to embrace the first rule. You don’t have to run all the way, every day, but you do have to get out of the house and start running.

And when making promises to a group where trust matters, the second rule definitely applies.

Seth’s Blog

Python Logger Printing Multiple Times

This is my standard boilerplate code for adding any logging functionality in an app. This makes starting and working on projects super easy. But sometimes if the project involves multiples modules, there is an annoying little thing that the log is printed multiple times on the console.

def create_logger(name="imap", path=os.getcwd()):
    fname = os.path.join(path, "{}.log".format(name))
    logger = logging.getLogger(name)
    logger.setLevel(logging.DEBUG)
    # Create handlers
    c_handler = logging.StreamHandler()
    f_handler = logging.FileHandler(fname, mode="w")
    c_handler.setLevel(logging.INFO)
    f_handler.setLevel(logging.DEBUG)
    # Create formatters and add it to handlers
    c_format = logging.Formatter("%(levelname)s %(message)s")
    f_format = logging.Formatter("%(asctime)s - %(levelname)s - %(message)s")
    c_handler.setFormatter(c_format)
    f_handler.setFormatter(f_format)
    # Add handlers to the logger
    logger.addHandler(c_handler)
    logger.addHandler(f_handler)

    return logger


def get_logger(name: str):
    logger = logging.getLogger(name)
    return logger  

In one particular project, which was using multiple modules, this setup was causing the logging messages to print multiple times. This duplicate output in a simple python logging configuration was not harmful but was annoying.

After a few googles searches, false re-starts and reading multiple debates on stack overflow found the solution that was as simple as this.

Solution

logger.propagate = False

Full code that works without the flaw is shown below

def create_logger(name="imap", path=os.getcwd()):
    fname = os.path.join(path, "{}.log".format(name))
    logger = logging.getLogger(name)
    logger.setLevel(logging.DEBUG)
    # Create handlers
    c_handler = logging.StreamHandler()
    f_handler = logging.FileHandler(fname, mode="w")
    c_handler.setLevel(logging.INFO)
    f_handler.setLevel(logging.DEBUG)
    # Create formatters and add it to handlers
    c_format = logging.Formatter("%(levelname)s %(message)s")
    f_format = logging.Formatter("%(asctime)s - %(levelname)s - %(message)s")
    c_handler.setFormatter(c_format)
    f_handler.setFormatter(f_format)
    # Add handlers to the logger
    logger.addHandler(c_handler)
    logger.addHandler(f_handler)
    logger.propagate = False

    return logger


def get_logger(name: str):
    logger = logging.getLogger(name)
    return logger  



Other related posts that you may like…

Simplicity Can Be Lucrative

Who hasn’t played with Legos or its many varied copycat replicas? My kids have a lot of fun with them. Many of our fun memories are around these toys. Even today I see my friends kids getting so engrossed with this. As soon as these simple plastic bits and bobs are laid out for them, they are in their own world.

So when I stumbled on this little historical note on lego’s history I couldn’t resist posting it here.

Infinite Builds: Modular Lego Bricks

Ole Kirk Christiansen, a carpenter, founded The Lego Group in 1932.

At the time, he was out of work because of the Depression and decided to build wooden toys in Denmark. In 1947, Ole got samples of a plastic brick invented and patented (“self locking building bricks”) by Mr. Hilary “Harry” Fisher Page in Britain, and began creating the automatic binding bricks that we know today as Lego bricks, a name that originated in 1953. Ole’s 1958 Lego patent (#3005282) states, “the principle object of the invention is to provide for a vast variety of combinations of the bricks for making toy structures of many different kinds and shapes.” And that was the magic of Lego—vast variety from simplicity. Anything imaginable could be built.

All kids could unleash their creativity on the world with simple, modular, relational blocks.

Today, Lego, with headquarters in Billund, Denmark, is the sixth largest toy company in the world, with over 5,000 employees andrevenue of $7.8 billon Danish Kroner.

Simplicity can be lucrative.