gitkis the battery included, simple GUI that can show you the state of your git at any point in time? It is very convenient and much more intuitive to use and useful than the git log and git reflog CLI commands.
I use it every day. The fact that GITK is shipped with git is a good thing as if you have git, you will have this simple utility.
Typing gitk in your bash or command line will open up this utility.
If you want to see all your branches you can use
gitk --all
This shows all the branches and structures.
But what if you want to see only the history of a single file. what if you want to track and piece through the history of just one file.
Well, there is one less know command-line option to gitk which comes in handy. The “--“
Here’s how to use it.
gitk -- single file
See the demo below
let me know if you know any other commands that I don’t know in the comments below?
import numpy as np
import pandas as pd
in_dict = dict(a=np.random.rand(3), b=np.random.rand(6), c=np.random.rand(2))
df = pd.DataFrame.from_dict(in_dict)
This fails with the following error
df = pd.DataFrame.from_dict(in_dict)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-2c9e8bf1abe9> in <module>
----> 1 df = pd.DataFrame.from_dict(in_dict)
~\Anaconda3\lib\site-packages\pandas\core\frame.py in from_dict(cls, data, orient, dtype, columns)
1371 raise ValueError("only recognize index or columns for orient")
1372
-> 1373 return cls(data, index=index, columns=columns, dtype=dtype)
1374
1375 def to_numpy(
~\Anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
527
528 elif isinstance(data, dict):
--> 529 mgr = init_dict(data, index, columns, dtype=dtype)
530 elif isinstance(data, ma.MaskedArray):
531 import numpy.ma.mrecords as mrecords
~\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in init_dict(data, index, columns, dtype)
285 arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays
286 ]
--> 287 return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
288
289
~\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype, verify_integrity)
78 # figure out the index, if necessary
79 if index is None:
---> 80 index = extract_index(arrays)
81 else:
82 index = ensure_index(index)
~\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in extract_index(data)
399 lengths = list(set(raw_lengths))
400 if len(lengths) > 1:
--> 401 raise ValueError("arrays must all be same length")
402
403 if have_dicts:
ValueError: arrays must all be same length
The solution is simple. I have faced this situation a lot, so posting this here on the blog for easy reference
use orient=’index’
df = pd.DataFrame.from_dict(in_dict, orient='index')
df.head()
0 1 2 3 4 5
a 0.409699 0.098402 0.399315 NaN NaN NaN
b 0.879116 0.460574 0.971645 0.147398 0.939485 0.222164
c 0.747605 0.123114 NaN NaN NaN NaN
df.T
a b c
0 0.409699 0.879116 0.747605
1 0.098402 0.460574 0.123114
2 0.399315 0.971645 NaN
3 NaN 0.147398 NaN
4 NaN 0.939485 NaN
5 NaN 0.222164 NaN
Covid 19 has affected everyone. Direct effects of these were felt by everyone and indirect effects will be felt for some time.
One of the second-order effects was the chip shortage and the consequent delay in car manufacturing and deliveries.
Let’s start from the beginning, In December 2021, after sitting on the fence and no longer batting my better half nugs, we booked our car.
The selection, shortlisting and elimination took a few weeks and then after a round of test drives, finally booked KIA Sonet G1.0T HTX iMT.
Being the most value for money (VFM) variant and the best transmission combination variant, this had a waiting of 19-20 weeks.
The first 10 weeks of waiting were easy to pass, looking at youtube videos of accessories, modifications, road trips etc but as we entered the double-digit phase of the waiting weeks, everyone was anxious to know when are we getting our car?
A couple of calls to the CSR only solicited, please wait you are on the 6th person in line. That’s what has happened so far till we entered the month of Feb.
Frustrated with the wait and the uncertainty, there is one thing left that the engineer in me was dying to try. Put some numbers to this uncertainty.
The uncertainty increased as we had a planned vacation in April, so my better half was anxious if there will be a clash in delivery and our travel plans
So turned to Montecarlo to predict what are the chances of getting the car in mid-Feb, end Feb, mid-Mar or beyond.
Here’s how I did it.
import numpy as np
import scipy.stats as st
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
Data Collection
Turned to google and the internet and got the data for last year’s KIA Sonet monthly deliveries in India. This is 2021 data on which we will base our predictions.
Not an encouraging sign, limited and declining production numbers
We will use a 90% confidence interval for the normal distribution and a rough guesstimate of the number of KIA dealers in India
ci_90 = 3.29
#Rough no of KIA delaers
nodelears = 250
# No of monte carlo simulations
nos = 1000000
Time to Make Some Assumptions
After this data collection, there were other things that I have to estimate
Percentage of people opting for the petrol variant. Low and High
Percentage of people opting for IMT transmission. Low and High
Percentage of people buying the HTX variant. HTX is the most value-for-money variant. Mostly derived this number based on our discussions with the sales person and youtube videos and a cursory look at social media posts
# This is our rough consesus estimate with 90% confidence
# % of people oping for petrol
petrol_low = 0.6
petrol_high = 0.7
# IMT transmission
# % of people buying imt transmission
imt_low = .30
imt_high = .40
# HTX Variant probability
# % of people buying the Sonet HTX variant
variant_low = 0.40
variant_high = 0.50
Converting all this data collected into normal distributions
petrol = st.norm(loc=(petrol_low+petrol_high)/2, scale=(petrol_high-petrol_low)/ci_90)
imt = st.norm(loc=(imt_high+imt_low)/2, scale=(imt_high-imt_low)/ci_90)
variant = st.norm(loc=(variant_high+variant_low)/2, scale=(variant_high-variant_low)/ci_90)
# Generating the data for 1000000 simulations
petrol_results = petrol.rvs(nos)
imt_result = imt.rvs(nos)
variant_result = variant.rvs(nos)
Now using the historical data and the simulation numbers of Kia Sonet deliveries to predict the next 3 month mean deliveries
Encouraging sign or were we too optimistic about our assumptions?
# Storing everything into a dataframe for easy statistics
data = pd.DataFrame({
"sales":sales,
"petrol":petrol_results,
"imt":imt_result,
"variant":variant_result})
print(data.describe())
sales petrol imt variant
count 1000000.000000 1000000.000000 1000000.000000 1000000.000000
mean 6607.339004 0.650043 0.349980 0.449967
std 962.995620 0.030412 0.030419 0.030376
min 3578.000000 0.507467 0.202519 0.312613
25% 5960.000000 0.629536 0.329489 0.429458
50% 6646.666667 0.650035 0.349975 0.449975
75% 7325.666667 0.670540 0.370531 0.470449
max 8859.000000 0.796902 0.496014 0.588184
Computing final number
Once we have this data, now using this we calculate Nocars that our dealership can get
Nocars is a number of deliveries that a dealership will get for the variant, transmission and engine we are interested in.
data["nocars"] = (data.sales*data.petrol*data.imt*data.variant)/nodelears
print(data.nocars.describe())
count 1000000.000000
mean 2.705510
std 0.512645
min 1.070139
25% 2.344456
50% 2.686584
75% 3.044903
max 5.662677
Inference
As seen in the above column, the dealership will receive less than 3 cars delivered per month 75% of the time. Not looking good for us we were the 6th person in line.
Probability of number of Petrol Sonets IMT transmission HTX variants the dealership can get in a given month, based on past data
for i in range(1,6):
print(i, round(data[data.nocars>i]["nocars"].count()/nos*100,2), "%" )
1 100.0 %
2 92.0 %
3 27.69 %
4 0.85 %
5 0.0 %
Bottom line, very slim chance of getting the car by end of March, most probable date was the end of April, which came to the promised delivery date.
What has this exercise taught me?
A lot.
Putting a number to that uncertainty was a huge deal for me and with this exercise, I dispelled all hope that I will get the car in March as verbally promised by the dealership. Also, these calculations gave me a little insight into how Covid continues to affect us way beyond the initial days.
I have used Montecarlo in my work for modelling material, geometry, and BC uncertainties in gas turbine engines but this was the first time I tried using it on something so close to my own life and circumstances.
It was a fun exercise.
Update:
Someone doing this analysis today can use the data provided by the company
According to the company, 25% of Sonet buyers chose the iMT variants, while 22% opted for the automatic transmission. 26% of customers prefer the top variants, while diesel variants accounted for 41% of the overall Sonet sales.
According to Kia, the two most popular colours for the Sonet are Glacier White Pearl and Aurora Black Pearl. These account for 44% of the overall dispatches.
If you are using git and eventually your git repo will gather dust, will bloat. I use git to manage my desktop and overtime this particular git repo becomes bloated, so these are two commands that I come back to every quarter or so to keep things tidy.
git fsck
Verifies the connectivity and validity of the objects in the database.
git-fsck tests SHA-1 and general object sanity, and it does full tracking of the resulting reachability and everything else. It prints out any corruption it finds (missing or bad objects).
git-gc
Cleanup unnecessary files and optimize the local repository
Runs a number of housekeeping tasks within the current repository, such as compressing file revisions (to reduce disk space and increase performance), removing unreachable objects which may have been created from prior invocations of git add, and packing refs, pruning reflog, metadata or stale working trees.
Am I doing something wrong? Is there a better way, please let me know in the comments?
You want to add pretty graphics in the back of your data. How to do this with matplotlib?
Solution
import numpy as np
import matplotlib.pyplot as plt
# Path to the image
fpath =r"C:\Users\sukhbinder.singh\Desktop\day.jpg"
# Read the image
img = plt.imread(fpath)
# Plot the image
fig, ax = plt.subplots()
ax.imshow(img)
a, b, c,d = plt.axis('off')
# Now plot your appropriatly scalled data. We will plot some
# random numbers
xx = np.random.randint(a,b, size=100)
yy = np.random.randint(d,c, size=100)
plt.plot(xx,yy, "r.")
plt.savefig("wall.png")
plt.show()
This is a problem that had annoyed me a lot because conda environment file had specific versions specified and we were not allowed to change them to maintain compatibility of python code across different users and geographic.
Problem
conda install numpy=1.11.3 mkl=11.3.3 matplotlib=1.5.3 psutil=5.4.7 numexpr=2.6.1 h5py=2.6.0 hdf5=1.8.15.1 pandas=0.23.3 pytables=3.2.2 python=3.5.2 python-dateutil=2.5.3 setuptools=27.2.0 vc=14.1 vs2015_runtime zlib=1.2.11 bzip2=1.0.6 scipy=0.18.1
(py35) C:\Users\Sukhbinder>conda install numpy=1.11.3 mkl=11.3.3 matplotlib=1.5.3 psutil=5.4.7 numexpr=2.6.1 h5py=2.6.0 hdf5=1.8.15.1 pandas=0.23.3 pytables=3.2.2 python=3.5.2 python-dateutil=2.5.3 setuptools=27.2.0 vc=14.1 vs2015_runtime=15.5.2 zlib=1.2.11 bzip2=1.0.6
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
PackagesNotFoundError: The following packages are not available from current channels:
- vs2015_runtime=15.5.2
Current channels:
- https://conda.anaconda.org/conda-forge/win-64
- https://conda.anaconda.org/conda-forge/noarch
This is my standard boilerplate code for adding any logging functionality in an app. This makes starting and working on projects super easy. But sometimes if the project involves multiples modules, there is an annoying little thing that the log is printed multiple times on the console.
In one particular project, which was using multiple modules, this setup was causing the logging messages to print multiple times. This duplicate output in a simple python logging configuration was not harmful but was annoying.
After a few googles searches, false re-starts and reading multiple debates on stack overflow found the solution that was as simple as this.
Solution
logger.propagate = False
Full code that works without the flaw is shown below
Yes yes I know we should use poetry and other packaging mechanisms, but for simple small projects at home or for a simple application the setup.py is a good place to begin
Here’s a sample setup.py that I have used in many of my personal projects
Standard Setup.py
import pathlib
from setuptools import find_packages, setup
# The directory containing this file
HERE = pathlib.Path(__file__).parent
# The text of the README file
README = (HERE / "README.md").read_text()
setup(
name="winsay",
version="1.1",
packages=find_packages(),
license="Private",
description="say in windows",
long_description=README,
long_description_content_type="text/markdown",
author="sukhbinder",
author_email="sukh2010@yahoo.com",
url = 'https://github.com/sukhbinder/winsay',
keywords = ["say", "windows", "mac", "computer", "speak",],
entry_points={
'console_scripts': ['say = winsay.winsay:main', ],
},
install_requires=["pywin32"],
classifiers=[
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.7",
],
)
This is useful and is always a good starting point for my project files.
You have a JSON file and you are tired of getting the JSON just as a plain vanilla dictionary, then the following code using the namedtuple available in the collections module in standard python can come to your rescue
Here’s an example
from collections import namedtuple
import json
fname =r"D:\pool\JobFolder\INLT2916\1\run_1\sample_1.json"
with open(fname, "r") as fin:
data = json.load(fin)
def convert(dictionary):
for key, value in dictionary.items():
if isinstance(value, dict):
dictionary[key] = convert(value)
return namedtuple('GenericDict', dictionary.keys())(**dictionary)
objdata = convert(data)
Observe the convert definition. Now one can access its elements like
If you have seen my post on the expenseDjango app, the dashboard is rendered using matplotlib’s static files which are passed on runtime to the webpage HTML.
See this here in action. All the png images are generated on the fly and sent to the browser to display.
Here’s the simple code stript down code to demonstrate this with a simple dummy graph
# DJANGO View code
from django.shortcuts import render
import matplotlib.pyplot as plt
import io
import urllib, base64
def home(request):
plt.plot(range(10))
fig = plt.gcf()
#convert graph into dtring buffer and then we convert 64 bit code into image
buf = io.BytesIO()
fig.savefig(buf,format='png')
buf.seek(0)
string = base64.b64encode(buf.read())
uri = urllib.parse.quote(string)
return render(request,'home.html',{'data':uri})
Using matplotlib, io, base64 library to accomplish the whole thing. Matplotlib to plot, io to hold the plot byte buffer and then base64 to convert the bytes to string representations that the browser can use to display.
I could have used chartsjs or other javascript libraries which will give interactive plots, but this matplotlib implementation is a good place to start and it works without any more complexity.
Here’s an error I encountered while migrating a Django project from my PC to Mac. Nothing related to the difference in architecture but as I later learned, the solution was quite simple and comes out of the box in Django.
PROBLEM
OperationalError at /admin/exp/expense/add/
no such table: exp_expense
Request Method: POST
Request URL: http://127.0.0.1:8000/admin/exp/expense/add/
Django Version: 2.2.16
Exception Type: OperationalError
Exception Value:
no such table: exp_expense
Exception Location: D:\apps\anaconda3\lib\site-packages\django\db\backends\sqlite3\base.py in execute, line 383
Python Executable: D:\apps\anaconda3\python.exe
Python Version: 3.8.3
Python Path:
['C:\\Users\\Sukhbinder\\Desktop\\PROJECTS\\exptest\\test_exp',
Solution
(sdh_env) (base) C:\Users\Sukhbinder\Desktop\PROJECTS\exptest\test_exp>python manage.py migrate --run-syncdb Operations to perform: Synchronize unmigrated apps: exp, messages, rest_framework, staticfiles Apply all migrations: admin, auth, contenttypes, sessions Synchronizing apps without migrations: Creating tables… Creating table exp_expense Running deferred SQL… Running migrations: No migrations to apply.
That’s it. Addition of –run_syncdb in the migrate command.
h5repack is a command-line tool that applies HDF5 filters to an input file file1, saving the output in a new file, file2.
Removing entire nodes (groups or datasets) from an hdf5 file should be no problem. However, if you want to reclaim the space you have to run the h5repack tool.
Deleting a Dataset from a File and Reclaiming Space
HDF5 does not at this time provide an easy mechanism to remove a dataset from a file or to reclaim the storage space occupied by a deleted object.
Removing a dataset and reclaiming the space it used can be done with the H5Ldelete function and the h5repack utility program. With the H5Ldelete function, links to a dataset can be removed from the file structure. After all the links have been removed, the dataset becomes inaccessible to any application and is effectively removed from the file. The way to recover the space occupied by an unlinked dataset is to write all of the objects of the file into a new file. Any unlinked object is inaccessible to the application and will not be included in the new file. Writing objects to a new file can be done with a custom program or with the h5repack utility program.
You have installed Miniconda3 on my Windows (10/7) laptop. You can run python through the Anaconda Prompt but python is not recognised in the Windows Command Prompt.
From Windows Command Prompt if you type in ‘conda info’ you get this because it doesn’t even recognise conda:
‘conda’ is not recognized as an internal ….
How to solve this?
Sometimes having conda in cmd line is a useful thing to have. Opening Anaconda prompts just for accessing conda utilities is a hassle, so I always have conda available in cmd systemwide.
Here’s are two steps to follow.
To get conda in cmd system-wide
Step 1
If Anaconda is installed for the current user only, add %USERPROFILE%\Anaconda3\condabin (I mean condabin, not Scripts) into the environment variable PATH (the user one). If Anaconda is installed for all users on your machine, add C:\ProgramData\Anaconda3\condabin into PATH.
How do I set system environment variables on Windows?
set path=%path%%USERPROFILE%\Anaconda3\condabin
Step 2 Open a new Powershell or CMD, run the following command once to initialize conda.
conda init
These steps make sure the conda command is exposed to your cmd.exe and Powershell.
Hope this helps someone.
Pair this post with these useful posts to dig deeper.
If you are using windows command line cmd, you need to know about the for command. This command is useful and can help you automate many common tasks with the windows batch command. I have talked about the for command a lot.
But recently I was surprised to find another gem of a command that makes using files a lot more convenient. The command is forfiles.
Don’t know when this was introduced but it’s a command worth learning about if you manipulate files from the command line of windows.
Forfiles helps your select a file or set of files and execute a command on that file. Really really useful if you work on batch jobs involving a number of files.
FORFILES [/P pathname] [/M searchmask] [/S]
[/C command] [/D [+ | -] {MM/dd/yyyy | dd}]
Description:
Selects a file (or set of files) and executes a
command on that file. This is helpful for batch jobs.
Parameter List:
/P pathname Indicates the path to start searching.
The default folder is the current working
directory (.).
/M searchmask Searches files according to a searchmask.
The default searchmask is '*' .
/S Instructs forfiles to recurse into
subdirectories. Like "DIR /S".
/C command Indicates the command to execute for each file.
Command strings should be wrapped in double
quotes.
The default command is "cmd /c echo @file".
The following variables can be used in the
command string:
@file - returns the name of the file.
@fname - returns the file name without
extension.
@ext - returns only the extension of the
file.
@path - returns the full path of the file.
@relpath - returns the relative path of the
file.
@isdir - returns "TRUE" if a file type is
a directory, and "FALSE" for files.
@fsize - returns the size of the file in
bytes.
@fdate - returns the last modified date of the
file.
@ftime - returns the last modified time of the
file.
To include special characters in the command
line, use the hexadecimal code for the character
in 0xHH format (ex. 0x09 for tab). Internal
CMD.exe commands should be preceded with
"cmd /c".
/D date Selects files with a last modified date greater
than or equal to (+), or less than or equal to
(-), the specified date using the
"MM/dd/yyyy" format; or selects files with a
last modified date greater than or equal to (+)
the current date plus "dd" days, or less than or
equal to (-) the current date minus "dd" days. A
valid "dd" number of days can be any number in
the range of 0 - 32768.
"+" is taken as default sign if not specified.
/? Displays this help message.
Examples:
FORFILES /?
FORFILES
FORFILES /P C:\WINDOWS /S /M DNS*.*
FORFILES /S /M *.txt /C "cmd /c type @file | more"
FORFILES /P C:\ /S /M *.bat
FORFILES /D -30 /M *.exe
/C "cmd /c echo @path 0x09 was changed 30 days ago"
FORFILES /D 01/01/2001
/C "cmd /c echo @fname is new since Jan 1st 2001"
FORFILES /D +6/9/2021 /C "cmd /c echo @fname is new today"
FORFILES /M *.exe /D +1
FORFILES /S /M *.doc /C "cmd /c echo @fsize"
FORFILES /M *.txt /C "cmd /c if @isdir==FALSE notepad.exe @file"
This is a common problem that I have seen while building a wheel in python. Mostly this is the case because wheel is not a standard python library distributed with python.