Is the process of changing the organization and relationships among data fields
When the first bits were dropped on the first hard-disk drive, it’s not likely that computer scientists imagined the onslaught of data just 50 years would bring. But those pioneers probably recognized that data management was going to become a challenge. Part of meeting that challenge is data modeling: the practice of developing data storage and retrieval solutions that fit the requirements of a particular business. Show
What Is Data Modeling?Data modeling starts with understanding the business needs of an organization and then understanding its data — the conceptual modeling stage. From there, the data modeling process defines data elements, the relationships among them and the data structures they form (logical modeling) in order to create a manageable, extendable and scalable data management system aligned with the needs and requirements of a business (known as physical modeling). The organization’s database is constructed based on that physical model. Data modeling is critical for any data-driven organization because it holds the needs and requirements of the business up against the organization’s data and optimizes the data for business use. While it can be a fairly abstract topic, data modeling has a significant impact on how data is stored and retrieved, which translates directly to the kinds of reports that business leaders can get to help run their organizations and make pivotal decisions. Data modeling vs data analysisThere is a symbiotic relationship between data modeling and data analysis. The output of data modeling is meant to make data mining and analysis more efficient and meaningful. More specifically:
Key Takeaways
Data modeling is the process of developing a plan for how an organization wants to collect, update, organize, store and analyze data. In data modeling, key business concepts are mapped to available and prospective data so that the relationships between the concepts and the way data is warehoused can be visualized. The end result of data modeling is to give the business efficient and meaningful analysis and insights from its data. Data modeling is highly technical, a bit arcane and interdependent with databases and programming languages. In other words, the choice of the database system in which data will be stored and the language used to describe a data model can influence the development of the data model. The good news is, modern business software, such as enterprise resource planning (ERP) systems, usually comes with preconfigured data models that can be used as is or customized. Why Use Data Models?A database is only as valuable as the faith an organization has in it. Data modeling helps support that faith by ensuring the organization’s data, which is often stored in your data warehouse, is accurately represented in a database so it can be meaningfully analyzed. On a very tactical level, data modeling helps to define the database — the relationships between data tables, the “keys” used to index and connect tables and the stored procedures that make accessing the data efficient. On a higher level, however, the reason to use data models is to ensure that database designs are aligned with the needs of the business. Without input from business stakeholders, it’s very unlikely that subsequent data analysis will reveal useful insights or that data will be organized and stored efficiently. Data modeling also provides a standards-based methodology for constructing, managing and growing data assets. By spending the time to create data models and incorporate data model patterns — best-practice templates for common types of data — an organization can make better use of the data it collects and can grow its data and teams more effectively. Data Modeling TechniquesA variety of techniques and languages have been created to develop data models. The general idea behind data modeling techniques is to find a standard approach that represents the organization’s data in the most useful way; the languages help communicate the data model by defining standard notation for describing the relationships between data elements. Any language can be used to describe any technique. Why an organization would use one technique and language over another has to do with domain knowledge within the organization and customer requirements. For example, companies that do business with the Department of Defense are required to use the IDEF1X markup language. The most popular technique among data scientists is the Entity Relationship (E-R) Model and the most popular language is the Unified Modeling Language (UML). Entity Relationship (E-R) ModelThe E-R model diagrams a high-level view of an organization’s data and the relationships among them. As the name suggests, it models out the relationships between data “entities.” The E-R model also establishes critical concepts that the final database must support. As an example, a medical research publisher may find that each research paper (or entity) it publishes has a one-to-many relationship with a table of authors because there can be more than one author to a specific entity. Similarly, each paper/entity has a one-to-one relationship with a table of medical specialties and a one-to-many relationship with a table of subscribers. The publisher’s E-R model will show that the author table must support an entity identifier allowing multiple authors to be attributed to the same research paper. The research paper table must support a specialty identifier to connect the papers to the specialty table. If authors need to have a connection to the specialties database, the E-R model will establish how this is accomplished most efficiently. The example above allows for authors to be connected to specialties, but depending on how that data will need to be used, it may not be the most efficient connection. If needed, the E-R model will scope a connection directly between the author table and the specialty table. While the E-R model is considered by many to be the best option for relational database design, there are also three other common techniques. They are the:
UML (Unified Modelling Language)The Unified Modeling Language (UML) was developed as a standards-based way to describe the connections and relationships among different data entities in a data model. Although UML was originally developed to serve the needs of the larger software development universe, many of its core principles and notations apply directly to data modeling. UML is important because it provides a common data modeling language that helps data professionals communicate about the data model. It’s conceptually similar to the way music notation enables a composer to convey his or her intentions to musicians and enables an orchestra to play the piece. UML allows a database modeler to create the blueprint that describes the attributes and behaviors of the data set in a way that a database administrator can execute. Using UML, a database modeler can describe the specific attributes of data tables (i.e., entity types and field names) and can also describe the relationships between them (i.e., “customer_id” connects to the customer table). And it provides a map that database administrators can use in the ongoing management of data assets. UML is not the only language used to notate database schema. Other languages include:
Types of Data ModelingThere are essentially three types of data modeling that, together, outline a best-practices process that takes data from business requirements through to creating the actual data stores: conceptual, logical and physical. It’s helpful to think about these stages using a home-building metaphor. An architect takes the ideas, requirements and needs from a homeowner (conceptual data modeling), designs them into a blueprint (logical data modeling) and then a contractor constructs the home (physical data modeling). It’s important to note that data modeling builds the house — but the data hasn’t moved in yet. Conceptual Data Modeling (or Enterprise Data Modeling)The goal of conceptual data modeling is to organize ideas and define business rules. The main participants are business stakeholders and data modelers/architects. Business stakeholders outline what they need the data to provide, and data architects specify ways data can be organized to provide it. For example, a business stakeholder may have a requirement that sales figures are available and always “up-to-date.” Since “up-to-date” can mean different things to different people, the data modeler looks to eliminate ambiguity by defining what “up-to-date” means to the stakeholder. The difference between “up-to-the-minute” and “as-of-the-end-of-yesterday” have very different implications for system complexity and costs. Similarly, defining vocabulary is an important part of conceptual data modeling. It ensures that disparate team members know, for example, what is meant by “up-to-date,” and that they all use the term to mean the same thing. Among other key elements to work out during the conceptual phase are:
For example, sales activity and sales staff are two entities. Each of these entities has certain attributes. Some attributes of the sales activity entity might be which product was sold and by which account manager. The sales staff entity has a wholly separate set of attributes, some of which most certainly are the salesperson’s name and the region or market segment they represent. There is a relationship between them because sales staff creates sales activity. The data modeler will note that the sales activity table needs to connect to the sales staff table by way of an employee identifier. In all likelihood, products and customers are also entities with their own attributes and will be referenced in the sales activity table by identifiers that link to product and customer tables. A key component of the conceptual data modeling phase is to look at business concepts across the organization and to resolve differences that might exist for the same information from two different departments. Logical Data ModelingThe goal of logical data modeling is to bridge the conceptual and physical models — it translates the conceptual model into a set of instructions that can be used to create the physical model. During this “blueprint” phase, the structure of the data and the relationships among them are defined and the entire database plan is laid out with the connections between entities diagrammed. For example, a logical data modeling plan will note that the sales activity table contains a product identifier that follows an alphanumeric format used to look up product information from a separate product table. It will also show that the sales activity table contains an order number used to connect with the financial table(s), but the order number will bear no connection to the product table. Thus, the output of logical data modeling is a kind of manifest documenting the needs and requirements of disparate parts of the organization in a single overall data management plan. It’s an important document. Physical Data ModelingIt’s time to build the home. As the name suggests, physical data modeling gets into the physical specifics of database storage: what data is collected in what columns, in which tables and how those tables are specifically connected. Table structure, primary, secondary and foreign keys — codes used to identify data relationships — and indexes are fully defined. It’s during physical data modeling that specific database management systems (DBMSes) are addressed. Different DBMSes have various limits in terms of size and configuration, so it’s important to choose a DBMS that meets the needs and requirements of the data model. Advantages of Data ModelingData systems that have gone through data modeling have many advantages, especially for organizations that collect and use a high volume of business data. Possibly the most important advantage is the data modeling process ensures a higher quality of data because the organization’s resulting data governance, or a company’s policies and procedures to guarantee quality management of its data assets, follow a well-thought-out plan. A proper data modeling process also improves system performance while saving money. Without the data modeling exercise, a business could find the systems they use are more extensive than needed (thus costing more than they need) or won’t support their data needs (and performing poorly). Another advantage of data modeling is so much becomes known about the data (type and length, for example) that organizations can create applications and reports that use data more rapidly and with fewer errors. A good data modeling plan will also enable more rapid onboarding of acquired companies, especially if they also have a data modeling plan. At the very least, the acquired company’s data modeling plan can be used to assess how quickly the two data sets can be connected. What’s more, the existence of a plan will expedite conversions to the acquiring company’s systems. Disadvantages of Data ModelingData modeling is not for every organization. If an organization does not use or plan to collect a substantial amount of data, the exercise of data modeling might be overkill. For organizations that consider themselves data-driven and have — or plan to have — a lot of data, the main disadvantage of data modeling is the time it takes to create the plan. Depending on the complexity of the organization and the spectrum of data being collected, data modeling might take a long time. Another potential disadvantage depends on the willingness of non-technical staff to fully engage in the process. Integral to data modeling is that the business needs and requirements are as fully described as possible. If business stakeholders are not fully engaged in the data modeling process, it’s unlikely that data architects will get the input they need for successful data modeling. Examples of Data ModelingConsider this example of a small hotel chain to illustrate the role of data modeling and its importance to business decision-making. The hotel chain books rooms through three channels: it’s a call center, a website and many independent travel sites. The independent sites are problematic because they take a commission and the hotel chain must spend extra to make it to the top of search results. Data modeling reveals the importance of tracking independent travel site data in coordination with other data so that the hotel chain can decide where to spend its money. The data modeler notes that sales source information must be tracked by each independent travel site (i.e., channel) so that the business can decide the value of each. Additionally, the data must connect to two other data sets: commissions paid for orders and advertising spend. This way, a report or dashboard can be created showing the cost of sales that came through each channel. The business can use that report to determine which independent travel sites are most profitable and adjust its spending accordingly. Without going through the data modeling process, it’s likely that the sales and commissions tables would be naturally connected but the advertising spend table might not. Without data modeling, the advertising spend information may not be accessible at all because it is held in department-level spreadsheets. It’s the process of data modeling that brings this information together, setting requirements for data entry when needed, so that the business can get actionable data insights for decision-making. How to Model DataOnce data modelers have obtained a thorough understanding of a business and its data, they’re ready to begin the data modeling process itself. A solid data modeling process typically consists of the following eight steps. 8 Steps to Data Modeling
Industries That Use Data ModelingIndustries that tend to benefit from data modeling include any that uses a high volume of data to manage its business. Data modeling increases in importance for industries that use data as a core to their business, especially when privacy concerns and government regulation are added to the mix. Here are the industries that benefit the most from data modeling: This is by no means an exhaustive list of industries that can see benefits from data modeling. Government, professional services, sports teams and other industries can also see big benefits from data modeling. Data Modeling ToolsVarious software applications are available to help data architects visualize data models. They don’t do the data modeling but rather help to manage the data model. Typically, data modeling tools visualize the data tables and the relationships among them. As these maps can be vast, with hundreds of thousands of entities visualized, they have navigational functions such as search and zoom that allow data professionals to narrow or expand their view of the model. Data modeling tools are especially valuable for reference in ongoing database management. If a new entity type is launched, visualization from a data modeling tool can help to make sure the specifics of fields are correctly created. History of Data ModelingThe history of data modeling mirrors the history of computers, databases and programming languages, which all evolved together. In the 1960s, hard drives started to store larger amounts of information, and data management became a growing challenge. As disk capacities grew, the three main database formats in use today were first theorized: hierarchical, network and relational. IBM pioneered the hierarchical model, whose data is structured like an org chart: Each “parent” in the hierarchy can have many “children,” but each child can have only one parent. General Electric developed the network model: Each parent and child data element can have multiple parents and children of their own. A bit later — the mid 1970s — IBM developer Edgar Codd came up with the idea for relational databases: data arranged in interconnected tables of rows and columns. IBM stayed with its hierarchical databases and didn’t pursue relational databases at first. But other organizations did, most notably Software Development Laboratories, which changed its name to Relational Software Inc. in 1979 and to Oracle Corp. in 1983. By the 1990s, the relational model came to dominate the database market, though hierarchical and network models are still used in certain industries and for specific applications. As more companies began deploying relational databases, Peter Chen at Carnegie-Mellon University created the Entity Relationship (E-R) Model for data modeling. By the mid 1990s there was explosive growth in PC and storage technologies, and efficient database management became a paramount concern. A group at Rational Software created UML with an aim to improve collaboration in the modeling of data systems, and UML was ultimately codified as a standard for data modeling. Toward the end of the 1990s, the concept of non-relational databases, also called NoSQL, emerged. NoSQL databases use the brute force of modern computers — unimaginable in the 1970s — to search through massive amounts of structured and unstructured data stored without a predefined model or schema. The revolutionary concept behind NoSQL databases is that because data is stored in a structureless way, different data models can be applied to the data after it’s stored, depending on the business’s situational goal at the time or the specific application. Data modeling and its interdependent influences — computers, databases and programming languages — all continue to evolve. Future of Data ModelingThe sheer volume and diversity of data — structured and unstructured — continues growing at previously inconceivable rates thanks to the digitization of virtually all aspects of business and life. This includes everything from 4K videos and images to texts, emails, smartphones, smart watches, health trackers and internet of things (IoT) devices. In this data-rich future, it is the effectiveness of data modeling that will determine how much of this data is accessible and how well it can be used. Data modeling in the future will:
Imagine a police force of the future armed with smart badges that, among many other benefits, geo-locates every officer. This quickly amasses a significant amount of data, much of which is unimportant — but can suddenly become critically important under certain circumstances. While current location will be important to determine dispatch schedules, the historical information about police presence could be overlaid with other maps to have implications on broader planning. Add into the mix a feed from the police officer’s body camera and, not only will even greater amounts of unstructured data be generated, the importance of being able to reference the video by officer location doesn’t seem too futuristic at all. Conclusion As long as human activity in general, and business activity in particular, continues to generate rapidly accelerating volumes of information, making purposeful use of all that data will remain a growing challenge. Data modeling has evolved into a well-defined process with an associated set of tools through which organizations can shape data to fit their purposes. It is necessary for any data-driven organization that hopes to get the most value out of its data for business decision-making. Data modeling developed in concert with computers, databases and programming languages beginning in the early 1960s, is already a proven and effective method for creating responsive data storage systems — and continues to evolve. Data Modeling FAQQ: What is data modeling used for? A: Data modeling is used to structure a company’s data assets so they are efficiently and accurately stored and can be retrieved and analyzed to help the company make business decisions. Q: Why use data models? A: Data models establish a manageable, extensible and scalable methodology for collecting and storing data. The map that data modeling creates is used as a constant reference point for data collection and improves the ability of a business to extract value from its data. Companies that work in highly regulated industries find that data modeling is an important part of meeting regulatory requirements. Even non-regulated businesses can benefit from data modeling because it puts the needs of the business in the driver’s seat. Q: What are some examples of data modeling in practice? A: Data modeling looks at a business’s data and identifies different types of data and the relationships among them. For example, a customer file generally includes a series of components that describe it. There are data elements such as company name, address, industry, contacts and previous projects. There might even be data such as emails sent to people at that company. Depending on the business needs of the organization, a data modeler might design a database system with a table for basic company information that is linked to ancillary tables for contacts, previous projects, email conversations — and maybe a social feed made up of posts from people at that organization. Q: What is data modelling in data analysis? A: Data modeling is the plan for how data will be stored and accessed. Data analytics is the use of that data to create and analyze reports that a company can use to help make business decisions. Q: What are the eight steps of data modeling? A: Data modeling goes through eight discrete steps: identifying data entity types; identifying the attributes associated with them; applying naming conventions; identifying the relationships between data types; applying patterns (or templates) to make sure best-practice models are used; assigning keys that identify how the data is related; normalizing the data for efficiency and accuracy; and denormalization of specific data elements only for situations where speed trumps efficiency. When data is joined together it is called _________ when it is split apart it is called?When data is joined together it is called _________, when it is split apart it is called ________. data concatenation, data parsing. A construction company classifies their projects into one of seven different types. To keep track of project classification, the clerk enters a number from 1 to 7 in the ProjectType field ...
Is the process of analyzing data to make sure it has the properties of high quality data?Data profiling is a set of algorithms for statistically analyzing and assessing the quality of data values within a data set as well as exploring relationships that exist between data elements or across data sets.
What are the ways in which data is transformed into information quizlet?Data are transformed into information by adding value through context, categorisation, calculations, corrections and condensation.
Which of the following is the last step to having an analytics mindset?As noted above, the final step in the analytics mindset is to present your results. This is often done in the form of a visualization.
|