Data Piracy
Machine Learners: Feature Engineering
Hello pirates! I had to skip January because of my tight schedule but I am back with my first post in 2021. I have been dealing with machine learning projects the last couple of months, so I wanted to write this post to share my learnings. What is “Feature Engineering”? Well in a nutshell, it…
The Magic of DPLYR
Bang Bang!! In my very last post of 2020, I would like to introduce you a very useful package in R called “dplyr”. If you haven’t yet, install “tidyverse” to your R studio. You can write install.packages(“tidyverse”) to your console and it’ll install the package. Tidyverse contains multiple packages created by Hadley Wickham and one…
Statistical Distributions: Does Normal Distribution Exist in Real World?
The short answer is no. Normal distributions do not exist in the real-world business scenarios. Usually the data we receive have a uniform distribution rather than a “perfect bell curve”. The real question is: Do we need perfection in data to be able to analyze. NO. Perfect normal distributions are beautiful though they are theoretical.…
Data Sampling Using R Studio
Data sampling is a well-known statistical method to drive insights out of a big population (big data). Most of the classification, data modeling methods and machine learning algorithms performs better with an ideal data sample rather than using the entire dataset. Sampling is also important to save costs for the organizations. Instead of implementing any…
Follow My Blog
Get new content delivered directly to your inbox.