The Decision Maker’s Handbook to Data Science: A guide for non-technical executives, managers, and founders by Stylianos Kampakis Ph.D.
About The Book:
The Decision Maker’s Handbook to Data Science: A Guide for Non-Technical Executives, Managers, and Founders by Stylianos Kampakis.
Have you ever felt confused by terms such as “data science” and “big data“? What is really the difference between AI and machine learning? How can you hire a good data scientist and how do you build a data-driven organization? Have you ever thought you’d like to use data science, but you don’t know where to start?
The Decision Maker’s Handbook to Data Science was written specifically for you. It covers all the topics that a non-technical decision-maker needs to know if they are to use data science within their organization.
Driven by the author’s 10+ years of experience, the book’s aim is to demystify the jargon and offer answers to all the most common problems and questions that decision-makers face when dealing with data. Topics include:
1) Explaining data science. Demystifying the differences between AI, machine learning, and statistics.
2) Data management best practices.
3) How to think like a data scientist, without being one.
4) How to hire and manage data scientists.
5) How to set up the right culture in an organization, in order to make it data-centric.
6) Case studies and examples based on real scenarios.
Data science, machine learning, and artificial intelligence are amongst the main drivers of the technological revolution we are experiencing. If you are planning to collect and use data within your company, then the Decision Maker’s Handbook to Data Science will help you avoid the most common mistakes and pitfalls, and make the most out of your data.
Data science is expanding across industries at a rapid pace, and the companies first to adopt best practices will gain a significant advantage. To reap the benefits, decision-makers need to have a confident understanding of data science and its application in their organization.
It is easy for novices to the subject to feel paralyzed by intimidating buzzwords, but what many don’t realize is that data science is in fact quite multidisciplinary―useful in the hands of business analysts, communications strategists, designers, and more. With the second edition of The Decision Maker’s Handbook to Data Science, you will learn how to think like a veteran data scientist and approach solutions to business problems in an entirely new way.
Author Stylianos Kampakis provides you with the expertise and tools required to develop a solid data strategy that is continuously effective. Ethics and legal issues surrounding data collection and algorithmic bias are some common pitfalls that Kampakis helps you avoid while guiding you on the path to building a thriving data science culture at your organization.
This updated and revised second edition, includes plenty of case studies, tools for project assessment, and expanded content for hiring and managing data scientists Data science is a language that everyone at a modern company should understand across departments. Friction in communication arises most often when management does not connect with what a data scientist is doing or how impactful data collection and storage can be for their organization.
The Decision Maker’s Handbook to Data Science bridges this gap and readies you for both the present and future of your workplace in this engaging, comprehensive guide.
About The Author:
Dr. Stylianos (Stelios) Kampakis is a data scientist who is living and working in London, UK. He holds a Ph.D. in Computer Science from University College London, as well as an MSc in Informatics from the University of Edinburgh. He also holds degrees in Statistics, Cognitive Psychology, Economics, and Intelligent Systems.
He is a member of the Royal Statistical Society and an honorary research fellow in the UCL Centre for Blockchain Technologies1. He has many years of academic and industrial experience in all fields of data science: statistical modeling, machine learning, classic AI, optimization, and more. Stylianos’ academic experience ranges across various domains.
Stelios is one of the foremost experts in the area of sports analytics, having done his Ph.D. in the use of machine learning for predicting football injuries. He has also done work in the area of neural networks, computational neuroscience, and cognitive science. He is also doing research in blockchain and more specifically in the area of tokenomics, where he studies topics such as the best mechanisms for handling volatility in token economies and evaluating Initial Coin Offerings (ICOs).
In terms of industrial experience, Stylianos has worked on a wide range of problems. Some examples include using deep learning to analyze data from mobile sensors and radar devices, to recommender systems, to natural language processing for social media data. He has also done work in the areas of econometrics, Bayesian modeling, forecasting, and research design.
He also has lots of experience in consulting for startups, having worked with companies that have raised millions in funding. Stylianos is also very active in the area of data science education. He is the founder of The Tesseract Academy2, a company whose mission is to help decision-makers understand deep technical topics such as machine learning and blockchain.
Stelios is also teaching “Social Media Analytics”, and “Quantitative Methods and Statistics with R” in the Cyprus International Institute of Management3. Finally, he often writes about data science, machine learning, blockchain, and other topics on his personal blog The Data Scientist4. http://tesseract.academy
Did You Know: (Book Articles)
CORE FIELDS OF DATA SCIENCE
Data science has three core fields, namely artificial intelligence, machine learning, and statistics.
Artificial Intelligence is all about replicating human brain function in a machine. The primary functions that AI should perform are logical reasoning, self-correction, and learning. While it has a wide range of applications, it is also a highly complicated technology because to make machines smart, a lot of data and computing power is required.
Machine learning refers to a computer’s ability to learn and improve beyond the scope of its programming. Thus, it relies on creating algorithms that are capable of learning from the data they are given. They are also designed to garner insights and then make forecasts regarding data they haven’t previously analyzed.
There are three approaches to machine learning, namely supervised, unsupervised, and reinforcement learning, plus some sub-fields (such as semi-supervised learning). Here, we will be talking only about supervised and unsupervised learning, since this is what is mainly used in business.
Let’s say you want to sort all your photographs based on content.
In supervised learning, you would provide the computer with labeled examples. So, you’d give it a picture of a dog and label it an animal. Then you’d feed it a picture of a person and label it human. The machine will then sort all the remaining pictures.
In unsupervised learning, you’d just give the machine all the photos and let it understand the different characteristics, and organize your photos.
In reinforcement learning, the machine learns based on errors and rewards. Thus, the machine analyzes its actions and their results.
Statistics is an essential tool in the arsenal of any data scientist because it helps to develop and study methods to collect, analyze, interpret, and present data. The numerous methodologies it uses enable data scientists to:
– Design experiments and interpret results to improve product decision-making
– Build signal-predicting models
– Transform data into insights
– Understand engagement, conversions, retention, leads, and more
– Make intelligent estimations
– Use data to tell the story
UNDERSTANDING WHERE DATA COMES FROM
As previously mentioned, we generate a lot of data. Pretty much everything you do generates some form of data. Companies, for example, collect data from internal sources such as transactions, log data, and emails, but also from external sources such as social media, audio sources, as well as photos and videos. Other sources of data include:
Published data from credible sources such as government agencies, firms, and industrial associations
Designed experiments conducted in controlled environments
Automatic, such as user interactions with applications
Manual entry, which is effectively people inputting information into the system.
These sources of data can be further classified by the collection method.
DATA COLLECTION METHODS
There are two main types of data collection, namely observational and experimental. Observational collection means that data is gathered passively with no attempt to control the variables involved. For example, collecting customer feedback for a book or a retailer analyzing customer behavior are observational collection methods as there is no attempt to control any variable.
In fact, the majority of the data collected by companies is done via observational methods. Experimental data collection involves designing and conducting an experiment where certain variables are controlled while you study other variables. This is most common in academic circles but also in clinical settings.
A perfect example is when a pharmaceutical company tests a new drug. They set up experiments where they control certain variables, like the people involved in the study, and they test other variables, such as the effectiveness of the drug and potential side effects. In business, experimental data gathering is less prevalent.
A/B testing is one example of this type of data collection where two variables are tested against each other. For example, a company might test the headline of an article against a different headline to see which is more effective at drawing in traffic. DATA ACQUISITION CONSIDERATIONS When it comes to data acquisition, there are certain considerations that you need to take into account.
Each one is essential and, as you will see later, they can go so far as to affect your whole business model. These are
1. Appropriateness of the data
2. Nature of the domain
3. Time requirements
So, let’s take a look at each of these and what they mean.
If you study machine learning, one of the first quotes you will learn is “garbage in, garbage out.” In other words, no matter how amazing your algorithm is, if you don’t have the right data, then you aren’t going to get the right results.
So, let’s say you’re a retailer and you want to determine the average transaction size per customer for a particular store, which means you need to analyze the transactions for that store. If you enter the data for the wrong store, or input customer feedback instead, then you aren’t going to get the results you are looking for.
This is a bit of a simplistic and extreme example, but the point is that if you put garbage into your model, the algorithm can’t turn it into gold and you will get garbage out the other end. Now, it might not seem like a big deal but this happens more often than you realize in the sense that problems are often difficult or impossible to solve because the right data isn’t available. This is why it’s so essential to keep this point in mind.
If you also have read this book, please share your review below, we greatly appreciate your comment, and let’s talk about it!