- 28th August 2019 - 11:08 pm
- By Admin
Introductory Basics OF DATA ANALYSIS
Data analysis is the process of evaluating data using analytical and statistical tools to discover useful information and aid in business decision making. There are a several data analysis methods including data mining, text analytics, business intelligence and data visualization..
Is an administrative process that includes acquiring, validating, storing, protecting and processing required data to ensure the accessibility, reliability and timeliness of the data for its use.
Is the process of examining data sets in order to draw conclusions about the information they contain, increasingly with the aid of specialized systems and software.
APPLICATION AREAS FOR DATA ANALYTICS
It can be applied to gaming, travel, energy management and healthcare
TOOLS AND LANGUAGES FOR DATA ANALYTICS
R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.
Tableau Public is a free service that lets anyone publish interactive data visualizations to the web. Visualizations that have been published to Tableau Public can be embedded into the web pages and blogs, they can be shared via social media or emails and they can be available for download to the other users.
Apache Spark is an open-source distributed general-purpose cluster-computing framework. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Data Science is a field that refers to the collective processes, theories, concepts, tools, and technologies that enable the review, analysis and extraction of valuable knowledge and information from raw data.
APPLICATION AREAS FOR DATA SCIENCE
1. Digital Advertisement
2. Internet Research
3. Recommender System
4. Image/Speech Recognition
TOOLS AND LANGUAGES FOR DATA SCIENCE
Python is an interpreted, object-oriented, high level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, makes it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together.
Statistics Analysis System is a software developed by SAS Institute for advanced, multivariate analysis, business intelligence, data management, and predictive analytics.
Structured Query Language is a domain-specific language used in programming and designed for managing data held in a rational database management system, or for stream processing in a relational data stream management system.
Big data refers to voluminous amounts of unstructured data that an organization can partially mine and analyze for business gains.
APPLICATION AREAS FOR BIG DATA
3. Financial services
TOOLS AND LANGUAGES FOR BIG DATA
Hadoop is an open source distributed processing framework that manages data processing and storage for big data applications running in clustered systems. Hadoop can handle various forms of structured and unstructured data, giving users more flexibility for collecting, processing and analyzing data than relational databases and warehouses provide. It is at the center of a growing ecosystem of big data technologies that are primarily used to support advanced analytics initiatives, including predictive analytics, data mining and machine learning applications.
NoSQL originally referred to as non-SQL or non-relational provides a mechanism for storage and retrieval of data that is modelled in means other than the tabular relations used in relational database. NoSQL databases are increasingly used in big data and real-time web applications, they are sometimes called “Not only SQL” to emphasize that they may support SQL-like query languages, or sit alongside SQL databases in polyglot persistent architecture.
Hive is an ETL and data warehouse tool on top of Hadoop ecosystem and used for processing structured and semi structured data. Hive is a database present in Hadoop ecosystem that performs DDL(Data Definition Language) and DML(Data Manipulation Language) operations, and it provides flexible query language such as HQL for better querying and processing of data.
------------------------You Me May also need---------------------------------------------------------