black server racks on a room
|

Become an Expert in Data Science. Here is an important Brief

What is Big Data?

Big Data is data whose scale, distribution, diversity, and/or timeliness require the
use of new technical architectures and analytics to enable insights that unlock new
sources of business value.
McKinsey & Co.; Big Data: The Next Frontier for Innovation, Competition, and
Productivity [1]

While working as a Data scientist, or even in other Data related roles there will be different types of Data you will face during the course such as:

  • Structured data: Data containing a defined data type, format, and structure (that is, transaction data, online analytical processing [OLAP] data cubes, traditional RDBMS, CSV files, and even simple spreadsheets)
Structured Data
  • Semi-structured data: Textual data files with a discernible pattern that enables parsing (such as Extensible Markup Language [XML] data files that are self-describing and defined by an XML schema).
Semi-Structured Data
  • Unstructured Data: Data that has no inherent structure, which may include text documents, PDFs, images, and video.
Unstructured Data

What are Data Repositories?

  • Spreadsheets and data marts: Spreadsheets and low-volume databases for recordkeeping Analyst depend on data extracts such as Excel sheets, or Google Sheets
  • Data Warehouses: Centralized data containers in a purpose-built space. Supports BI and reporting, but restricts robust analyses. Analysts are dependent on IT and DBAs for data access and schema changes. Analysts must spend significant time getting aggregated and disaggregated data extracts from multiple sources. Such as Amazon Redshift, or Azure SQL Dataware house.
  • Analytic Sandbox: Data assets gathered from multiple sources and technologies for analysis. Enables flexible, high-performance analysis in a nonproduction environment; can leverage in-database processing. Reduces costs and risks associated with data replication into “shadow” file systems. “Analyst owned” rather than “DBA owned”.

Data Science vs Business Intelligence

Data science vs Business Intelligence

The image above explains everything on the topic of Data Science vs Business Intelligence

Data Analytics Lifecycle

  • What is Data Discovery: In Phase 1, the team learns the business domain, including relevant history such as whether the organization or business unit has attempted similar projects in the past from which they can learn. The team assesses the resources available to support the project in terms of people, technology, time, and data. Important activities in this phase include framing the business problem as an analytics challenge that can be addressed in subsequent phases and formulating initial hypotheses (IHs) to test and begin learning the data.
  • What is Data preparation: Phase 2 requires the presence of an analytic sandbox, in which the team can work with data and perform analytics for the duration of the project. The team needs to execute extract, load, and transform (ELT) or extract, transform and load (ETL) to get data into the sandbox. The ELT and ETL are sometimes abbreviated as ETLT. Data should be transformed in the ETLT process so the team can work with it and analyze it. In this phase, the team also needs to familiarize itself with the data thoroughly and take steps to condition the data.
  • What is Model planning: Phase 3 is model planning, where the team determines the methods, techniques, and workflow it intends to follow for the subsequent model building phase. The team explores the data to learn about the relationships between variables and subsequently selects key variables and the most suitable models.
Data Science cycle
  • How to Communicate results: In Phase 5, the team, in collaboration with major stakeholders, determines if the results of the project are a success or a failure based on the criteria developed in Phase 1. The team should identify key findings, quantify the business value, and develop a narrative to summarize and convey findings to stakeholders
  • What is Operationalize: In Phase 6, the team delivers final reports, briefings, code, and technical documents. In addition, the team may run a pilot project to implement the models in a production environment.

You must Check out my other Video for more information:

Data Analysis Video

Here you will find some amazing Projects for Data Analysis, and also intorduction to the theory of Data analysis.

2 Major type of Machine Learning Model

Supervised Learning: It uses known and labeled data as input. In a supervised model, input and output variables will be given.

Types of Supervised Machines Learnings Model are:

  • Predictive analytics (house prices, stock exchange prices, etc.)
  • Text recognition
  • Spam detection
  • Customer sentiment analysis
  • Object detection (e.g. face detection)

Unsupervised Learning: It uses unlabeled data as input. In unsupervised learning model, only input data will be given.

Types of Unsupervised Machine Learning Models are:

  • Clustering
  • Association
  • Dimensionality reduction

Business Case

23andme.com and genotyping

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *