Business intelligence and data glossary
A/B testing is an experiment in which two versions of marketing content are shown to users at random.
Statistical analysis then evaluates which version performs better, driving more users to the conversion goal.
A standard set of properties in database systems that ensure that database transactions are valid and reliable. ACID stands for atomicity, consistency, isolation, and durability.
The process of gathering, analyzing, and presenting data.
A methodology that helps teams respond to unpredictability through incremental work and shortened feedback loops (for example, short daily meetings where team members share what they’re working on).
A step-by-step formula or process used to analyze data. Popular data analysis algorithms include linear regression, logistic regression, and linear discriminative analysis.
Analysis of variance (ANOVA)
A type of statistical technique that’s used to analyse and discover mean differences in three unrelated groups (as opposed to the t-test, which is used to discover mean differences in two unrelated groups).
Analysis Services is Microsoft service, also known as Microsoft SQL Server Analysis Services, SASS, or MSAS. Analysis Service is an online data analytics tool that processes and minds data in Microsoft SQL Server.
It provides this analytical data for business reports and client applications such as Power BI, Excel, Reporting Services reports, and additional data visualization tools.
Used for privacy protection (often in relation to regulations, such as GDPR), data anonymization alters the data in a way that it cannot be attributed to any party (people, companies, or other entities).
API (Application Program Interface)
A set of routines, programming standards, and tools that specifies how to interact with different software components, like applications, databases, websites, etc.
A type of program or group of programs that enables a computer or user to perform a specific function. See data application.
Batch data processing
A way to process large volumes of data that have been collected over a certain period of time, as opposed to real-time data processing, where data is processed as it’s collected.
Software that interacts with the server and/or databases, without interfacing with the user.
The process used to compare and measure a company’s performance (metric, data point, KPI, etc.) with either the industry’s best practices, or a company’s own internal performance benchmark.
Big sets of information that are diverse, and grow fast. Big data refers to the volume of data, the speed at which it is created and processed, and the variety or scope of the data points covered.
Business analytics (BA)
The investigation of past business performance data to gain insight and drive business planning. Business analytics focuses on insights for the future by analysis of the past.
Business intelligence (BI)
BI encompasses the strategy and technology used to analyze business information data. It can include reporting, text and data mining, predictive and prescriptive analytics and benchmarking.
The churn rate is the rate at which customers stop doing business with an entity. It is often measured by the percentage of customers who cancel their subscription or unsubscribe to publications and mailing lists.
This refers to storing and accessing data and programs in a shared network, which in this case, is the Internet.
Traditional BI combined with social networking, wikis, or blogs to enhance collaborative problem-solving capabilities.
A columnar database stores data by columns instead of rows, making it perfect for fast, analytical query processing, and therefore for data warehouses.
Because of the columnar structure, stored information is easier to identify by the system, requiring less processing time.
Conversion is when a customer or user on your website or product performs a desired goal, such as filling out a form, signing up for a service, or making a purchase.
The percentage of total visitors or users that convert (perform a desired goal).
A multi-dimensional dataset used to evaluate and analyze business data from different perspectives.
A type of file that uses a comma to separate values. Each line of the file is a data record.
Data that’s collected and stored, but isn’t used to derive insights or for decision-making purposes. Often this data is required for regulation and legal compliance.
A GUI (graphical user interface) that provides summaries and quick views of important and relevant data, KPIs (key performance indicators) for different business purposes.
A collection of information, such as numbers, measurements, names, observations, words, and more. In terms of computer storage, data is a series of bits (binary digits) that have the value one or zero.
Provides data-related services to companies. Data agencies offer assistance in different (or all) areas of the data lifecycle within a company. Learn more about the services offered by a data agency (like Tperson).
Data as a service (DaaS)
A product or service that makes data easily and readily accessible through a cloud-based platform.
A collection of data that’s organized so that it can be easily accessed, managed and updated.
Database administrator (DBA)
A specialized software/computer administrator who maintains the database environment.
Program that organizes data in ways that allow users to interact with the data, for example, dashboards (like Google Analytics), travel sites (like Booking.com and Skyscanner), and even online marketplaces (like AirBNB).
The models, policies, rules, standards, or regulations that govern the way in which data is collected, stored, arranged, and put to use within a system and organization.
Also known as data cleaning, this is the process of identifying and removing data that’s inaccurate, irrelevant, incorrect, incomplete, or corrupt.
Data engineering is the practical aspect of data science, focusing on the business applications of data collection and analysis.
A mechanism that provides updated data to a user.
Data infrastructure is a digital infrastructure that enables data sharing and consumption.
The consolidation of data from different sources, into one, unified data view.
Database management system (DBMS)
Software that enables users to access the data in a database, as well as create, maintain, and control access to the data.
A storage repository that can hold a large amount of raw (unstructured), semi-structured, and structured data. All types of data can be stored in a data lake, with no limitations on format, file, or size.
A representation of the data flow and relationships between different data elements, that also illustrates the way that data is stored and accessed, and the format of the data.
Data mining is the process of identifying patterns in large datasets.
A collection of related data.
Data science is a multidisciplinary field that uses scientific methods to extract knowledge and insights from large data sets.
(DW or DWH)
A central repository of integrated data (from one or more sources) that’s used for data analysis and reporting. A data warehouse is a core component of business intelligence and analytics.
A set of data attributes related to something specific that’s of interest to a business, and which the business wants to monitor, for example, dimensions could be "products", "purchases", or "customers".
Dimensions are often entry points for numeric values, such as revenue or amount of clicks.
Data analysis model that analyzes incoming data streams at the edge of a network, e.g, at or close to a network switch, sensor, or other connected device (instead of waiting for the data to be sent to a central data processing point for analysis).
Method of integrating and incorporating BI reports into other software components.
An object in a data model (also referred to as a data entity).
An occurrence that is recorded by data analytics software, that describes actions performed by entities (software components or users).
For example, when a user clicks on a specific button on a website.
A spreadsheet software program developed by Microsoft. Often used as a data analysis tool, Excel is good for storing and visualizing data.
ETL (extract, transform, load)
A spreadsheet software program developed by Microsoft. Often used as a data analysis tool, Excel is good for storing and visualizing data.
In data warehousing, a data table that consists of measurements, metrics, or business process parameters. A fact table is at the center of a star schema, and is surrounded by dimension tables. Fact tables contain the data warehouse’s content, and store different types of measures like additive, non additive, and semi additive measures.
The elements of a program or device that interact directly with the end-user through the User Interface (UI).
Full stack is another way of saying the entire process, from A-Z. In development, it includes both the front end (client-side) and back end (server-side) of an application. In data management, it includes setting up data infrastructure, building reports, data science and data engineering.
General Data Protection Regulation (GDPR)
A legal framework that provides guidelines for data privacy in Europe. Specifically, it regulates the collection, processing, and storage of personal information belonging to individuals in the European Union.
Hierarchy is a way to organize levels of a dimension by granularity, usually from largest to smallest.
A statistical process that tests an assumption using data analytics.
A data structure that stores the values for a specific column in a table so that a number of records can be sorted on multiple fields, enabling binary searches.
Designs data warehouse architecture according to the corporate data model, which identifies the main subject areas and entities the enterprise works with, such as customer, product, vendor, and so on.
Juridical data compliance
The need to observe the relevant data laws and regulations of the country in which the data is stored.
This approach uses a bottom-up approach to data warehouse architecture design in which data marts are formed first based on the business requirements.
KPI (key performance indicator)
A measurable value that indicates if a company is reaching its key business objectives. Organizations use KPIs to evaluate their success in reaching specific targets and to measure business performance.
Lead and Lag
Analytical functions used to calculate the difference between rows in a table, Lead between the current and following row, and Lag between the current and previous row.
A type of machine learning algorithm that is used for predictive analytics. Linear regression uses historical data and to predict the possible impact of known variables on the future outcome.
Logistic regression (also known as logit regression)
A type of binary algorithm that’s used to predict probabilities, and help classify something (or assign an observation) as either one thing, or another. For example, using a logistic regression, we can predict whether a student will pass or fail a test.
Linear discriminative analysis (LDA)
Also known as normal discriminant analysis (NDA), this is a machine-learning algorithm that’s used to find a combination of features or characteristics that classify two or more classes of objects or events.
Machine learning is an AI application that gives computers the ability to learn and improve from data and experiences without being specifically programmed.
A set of data that provides information about other data, for example, a table of all tables in a database.
Microsoft Power BI
A business analytics service by Microsoft that provides interactive data visualizations and business intelligence capabilities, that allows end users to create their own reports and dashboards.
Microsoft SQL Server
A relational database management system (developed by Microsoft) that’s used primarily for storing and retrieving data upon request by other software applications.
Network analysis (NA)
A data science method that’s used to analyze, control, and monitor business processes and workflows.
A group of machine learning techniques modeled on the human brain in order to identify hidden patterns.
A probability function that describes the symmetric distribution of variable’s values. In normal distribution most of the variable’s values are clustered around the central peak, while the probabilities further away from the center taper off on both sides.
Online Analytical Processing (OLAP)
A category of software tools that enable users to analyze different dimensions of multidimensional data.
A database that’s used to store, manage and track real-time business information.
A data point that greatly differs from other data values and observations. Outliers may be a result of an error, or variability in the measurement.
A high level scripting language that is used with Apache Hadoop.
Also referred to as predictive analytics, this is a type of data model that uses statistics to predict possible outcomes.
A request for data or information from a database table or combination of database tables.
Real-time data processing
A way to process data as it’s collected, as opposed to batch data processing, which processes high volumes of data that have been collected over a certain period of time.
The process of collating and organizing collected data into formats like charts, tables, and diagrams, in order to analyze the data.
The sales funnel is a process that leads a customer to purchase a product or service. It includes creating initial interest (lead generation), nurturing that interest and overcoming customer concerns or barriers, securing a sale, and, in some cases, encouraging further purchases.
The ability of a database to increase its performance as the amount of data that it needs to gather grows.
There are several ways to enable database scalability, including vertical database scalability (adding more capacity to a single machine), and horizontal database scalability (adding more machines).
The organization of data as a blueprint showing how fact and dimension tables are arranged and constructed to form a relational database.
It determines what facts can enter the database according to the interest of the possible business users.
A database snapshot is single, static view of a SQL Server database taken at a specific moment in time.
SQL (Structured Query Language)
The most commonly used standard language for accessing and manipulating databases.
According to ANSI (American National Standards Institute), SQL is the standard language for relational database management systems.
The simplest type of data mart schema. The star schema is the most common approach used to develop data warehouses and dimensional data marts, and consists of one of more fact tables that reference any number of dimension tables.
Targeting refers to the segmentation of the market according to defined criteria, such as buyer interests, demographics or personas and the development of unique marketing messaging and content for each segmented group.
A type of statistical technique that’s used to analyse and discover mean differences in two unrelated groups.
A large collection of data that isn’t stored in a structured database format.
The speed at which data is processed. Types of data velocity include real-time data processing, near real-time data processing, and batch processing.
A virtual table that shows a specific view of data stored in one or more real data tables.
The collection, analysis, and reporting of web data in order to understand website usage, and optimize a website. Web analytics is also used for business and market research, and to understand the effectiveness of a feature, capability, product, or service.
XML (Extensible Markup Language)
A simple, flexible, text format that was designed to meet the challenges of large-scale electronic publishing on the web.
YARN is a large-scale, distributed operating system for big data applications.
A multiple of the byte unit that’s one sextillion bytes.