What term is used when two patients have both been assigned the same health record number?

1. Introduction

Electronic health care data are increasingly being generated and linked across multiple systems, including electronic health records (EHRs), patient registries, and claims databases. In general, every system assigns its own identifier to each patient whose data it maintains. This makes it difficult to track patients across multiple systems and identify duplicate patients when different systems are linked. Efforts to address this challenge are complicated by the need to protect patient privacy and security.

Patient identity management (PIM) has been defined as the “ability to ascertain a distinct, unique identity for an individual (a patient), as expressed by an identifier that is unique within the scope of the exchange network, given characteristics about that individual such as his or her name, date of birth, gender [etc.].”1 For the purposes of this chapter, the scope of this definition will be expanded to refer to PIM as the process of accurately and appropriately identifying, tracking, managing, and linking individual patients and their digitized health care information, often within and across multiple electronic systems.2 A related idea is the concept of patient identity integrity, which is defined as “the accuracy and completeness of data attached to or associated with an individual patient.”3 Efficient patient identity management leads to high patient identity integrity.

The need for PIM strategies in the realm of health care data is rising, primarily because of the continued increase in the quantity and linkage of electronic health care data. The quantity of electronic health care data continues to grow. EHRs are increasingly being used to generate electronic health care data—72 percent of office-based physicians in the United States now use some form of EHR.4 This number is likely to increase significantly in response to the EHR incentive programs enacted by the Centers for Medicare & Medicaid Services (CMS), which “provide a financial incentive for the ‘meaningful use’ of certified EHR technology to achieve health and efficiency goals.”5 In addition to office-based EHRs, electronic health care data may be created by hospital EHRs, billing systems, insurance claims systems, pharmacy record systems, medical devices, and even by patients themselves via electronic patient health record systems. Large amounts of electronic health care data are also being generated from clinical research. Patient registries, for example, often use electronic data capture tools to collect and manage their data.

This increase in the quantity of electronic health care and research data creates new opportunities and need for data linkage. Pharmaceutical companies conducting clinical trials on specific genetic markers are seeking ways to more easily identify and recruit potential patients. EHRs and patient registries are interfacing with each other to minimize the burden of data entry on participating centers and practices (see Chapter 15). Data from patient registries and other electronic sources are being pooled together to form larger, more statistically powerful data sets for research and analysis (see Chapters 16 and 18 and Case Examples 42 and 43).

As more electronic health care data are generated and linked together, PIM has become crucial in order to (1) enable health record document consumers to obtain trusted views of their patient subjects, (2) facilitate data linkage projects, (3) abide by the current regulations concerning patient information–related transparency, privacy, disclosure, handling, and documentation,2 and (4) make the most efficient use of limited health care resources by reducing redundant data collection. To address this growing need, a number of standards development organizations are involved in the development of PIM strategies and standards. Several major organizations currently include: Integrating the Healthcare Enterprise6; Health Level Seven International7; and The Regenstrief Institute, Inc.8 See Appendix C for a more complete list.

2. PIM Strategies

The challenge of PIM is not a new one, and it has existed since health care information was first digitized. In general, PIM is conducted in one of two environments: either shared identifiers are present or they are absent. When shared identifiers exist, the main PIM strategy that has emerged is to assign a unique patient identifier (UPI) to each patient. In situations where shared identifiers do not exist, the most common PIM strategy is to use patient-matching algorithms to determine whether two sets of information belong to separate patients or the same patient.

2.1. When Shared Identifiers Are Present—UPI

2.1.1. Definition and Context

One of the most straightforward PIM strategies is the creation of a unique health identifier for individuals, or a UPI. Generally, a UPI is defined as a “unique, non-changing alphanumeric key for each patient”9 in a health care system, which is associated with each medical record or instance of health care data for that patient. Some proposed desirable characteristics of a UPI include that it be unique, nondisclosing, invariable, canonical, verifiable, and ubiquitous.10 In this context, “nondisclosing” means that the UPI does not contain any personal information about the patient, such as date of birth or Social Security number.

The concept of a universal UPI (i.e., a UPI that is assigned to a patient for life and is consistent across all electronic health care systems in the United States) has been discussed and debated for a number of years. The Health Insurance Portability and Accountability Act (HIPAA) of 1996 called for the adoption of “standards providing for a standard unique health identifier for each individual, employer, health plan, and health care provider for use in the health care system.”11 Since the passage of HIPAA, the concept of a UPI has generally been welcomed by the health care industry, which views it as a tool to reduce administrative workload and increase efficiency in exchanging electronic health data.12 Other groups, including private citizens and experts attending a National Committee on Vital and Health Statistics hearing in July 1998, have expressed serious concerns about the effects that a universal UPI might have on patient privacy and data security.12 These concerns have halted further efforts at creating a UPI in the United States until appropriate privacy legislation is in place13, a even though recent research has argued that adoption of a universal UPI would actually strengthen patient privacy and security (by limiting the number of access points to patient health care data) and, while requiring a significant upfront cost, could pay for itself in cost savings from error reduction and administrative efficiency.14 The adoption of a universal UPI is also viewed by some as the logical next step in strengthening and developing the national health information network.9

2.1.2. Current Uses of UPIs

UPIs have long been used within individual patient registries and data sets, especially those with prospective data collection, to track and link a particular patient's data over time. One of the most familiar types of UPI is a medical record number—a unique number assigned by a hospital or physician practice that links a patient with their medical record at that institution. Some hospitals have multiple electronic health information systems (e.g., EHRs, administrative/billing systems, lab systems, pharmacy dispensing systems) that assign UPIs to the patients within their domains, and a patient may not necessarily have the same UPI from system to system. Many patient registries also assign a UPI to patients upon screening or enrollment, and UPIs remain the simplest and most straightforward way to uniquely identify patients in a controlled data set.

UPIs have also been used on a slightly larger scale in aggregated data sets and to link existing databases with administrative data sets. For example, the National Database for Autism Research aggregates data from many different collections of autism data and biospecimens and generates a global unique identifier for each patient represented in the aggregated data set.15 Similarly, in 2008 the Society of Thoracic Surgeons Database began collecting unique patient, surgeon, and hospital identifier fields to facilitate long-term patient followup via linking to the Social Security Death Master File and the National Cardiovascular Data Registry.16

Outside the United States, UPIs have been used on a wider scale. In Sweden, for example, the personal identity number (PIN) is a unique administrative identifier assigned to all permanent residents in Sweden since 1947. The PIN is used to track vital statistics and also link patients between several national-scale patient registries, including the Patient Register (containing inpatient and outpatient data), Cancer Register, Cause of Death Register, Medical Birth Register,17 and Knee Arthroplasty Register.18 In England, a new health identifier was introduced in 1996—the NHS number is a 10-digit unique identifier used solely for the purpose of patient identification.19

2.1.3. Future Directions for UPIs

Recently, interest has increased in expanding the use of existing administrative identifiers (such as the Social Security number in the United States) to serve as UPIs in the health care arena. In 2009, the U.S.–based nonprofit Global Patient Identifiers proposed the Voluntary Universal Healthcare Identifier project, which aims to make unique health care identifiers available to any patient who uses the services of a regional health information organization or health information exchange (HIE).20 In May 2011, production deployment on the system began. The voluntary nature of this project and its capacity for patients to have both an “open” voluntary identifier and a “private” voluntary identifier (which can be used to control which caregivers have access to clinically sensitive information) make it an interesting alternative to a mandated universal UPI that would likely be assigned and administered by a Federal Government agency. In March 2011, the eCitizen Foundation began requirements-gathering work on the Patient Identity Service Project, an open-source, open standards–based patient identity service that will be able to identify and authenticate a patient across multiple systems to gain access to their health records and services.21 The project is funded by the OpenID Foundation of Japan, and future goals include research and development, design, implementation, and testing of the service.

2.1.4. Registries and UPIs

UPIs offer a straightforward way to identify specific patients within a particular registry. However, the implementation of a universal UPI in the United States has been halted by concerns over patient privacy, security, and confidentiality, which are unlikely to be resolved soon.

In Sweden, the ability to link data from separate national patient registries using the PIN has allowed researchers to pull from a pool of millions of Swedish residents to address difficult epidemiological questions. Concerns about patient privacy and confidentiality have been addressed by requiring that an ethical review board review and approve the planned study before any data are released to researchers. Past precedent has been that the review boards allow most PIN-based registry linkages, on the condition that the PINs are removed from the combined data set and replaced with different, unique serial numbers. Researchers also sign a legal agreement ensuring secure storage of the data and agreeing not to attempt to re-identify the patients in the de-identified data set they are given.17

2.2. When Shared Identifiers Are Not Present—Patient-Matching Algorithms

2.2.1. Definition and Context

In the absence of a national UPI in the United States, most researchers and hospital administrators have turned to patient-matching algorithms and other statistical matching techniques as a way to manage patient identities within the confines of a specific patient registry, research project, institution, or other grouping of health care data. This method of PIM involves comparing identifiable patient attributes (often demographics such as date of birth, gender, name, and address, but sometimes other individually identifiable information) using a logic model that then classifies each pair as a match, a non-match, or a possible match that may require manual review.

In the realm of patient and record matching, algorithms can be either deterministic or probabilistic. Deterministic algorithms are more straightforward and classify a pair of records as a match if they meet a specified threshold of agreement. The definition of agreement can vary depending on which data elements are available, the quality of the data (including the level of missing data), and the desired sensitivity and specificity of the algorithm. Probabilistic algorithms treat the match status of individual data elements as observable variables and the match status of the record pair as a latent variable, and model the observable variables as a pattern mixture. This method characterizes the uncertainty in the matching process, making it a more sophisticated (and less straightforward) method than deterministic matching.22

One major consideration in choosing an appropriate matching algorithm is the accuracy with which it matches patients. Matching accuracy is affected by the number of patients being compared, the number and type of common data elements being compared, and the mathematical validity of the algorithm itself. An algorithm that returns close to 100-percent matching in a pool of few patients with many data elements may perform less accurately in a pool of many patients with fewer data elements. Importantly, an algorithm that does not perform accurately may limit the conclusions and results able to be drawn from a particular data set.

2.2.2. Current Uses of Patient Matching Algorithms

Patient-matching algorithms are widely used when disparate health care data sources are combined and no unique, common patient identifier is available. The two main options are to use an existing record linkage software program or to develop a new matching algorithm independently. Commercial software options, such as Link Plus and The Link King, apply probabilistic algorithms that have been found to provide a higher sensitivity than matching using a basic deterministic algorithm.23 As described in Case Example 40, an open-source product (Febrl) was used to combine data from 11 different data sources into KIDSNET, a computerized registry that gives providers an overall view of children's use of preventive health services.24 Case Example 41 describes a different approach to patient matching.

Many patient-matching algorithms have been developed to meet the needs of specific projects. For example, a group at Partners HealthCare developed an algorithm to compare data in the Social Security Death Master File with demographic data in the Partners EHR system to identify patient deaths that may have occurred outside of Partners institutions (and therefore were not recorded in the patients' medical record). They then developed another algorithm using clinical data to identify false-positives resulting from the first algorithm (e.g., if clinical data for a supposedly deceased patient is recorded as more than 30 days after the date of death in the Social Security Death Index [SSDI], that patient must have been falsely matched to an SSDI entry).25 In another example, researchers at the University of Alabama Birmingham used matching algorithms to link emergency medical services data with hospital EHRs and a statewide death index to characterize the medical conditions and comorbidities of patients who received out-of-hospital endotracheal intubation.26

New and innovative algorithms that are unrelated to specific projects also continue to be developed, with the goal of advancing patient matching algorithm science. Recent examples include algorithms proposed by groups at Vanderbilt University in Nashville, Tennessee,27 John Radcliffe Hospital in the United Kingdom,28 and the University of Duisburg-Essen in Germany.29

2.2.3. Future Directions of Patient-Matching Algorithms

Any statistical matching approach is dependent on three factors, listed below:

  1. The quality of the data it is comparing: Are the data entered correctly, without mistakes? Are the data complete, or is there a high level of missing data? The quality of data within a particular registry will always be a factor of the practices employed by that registry. See Chapter 11 for recommended best practices.

  2. The comparability of the data it is comparing: Are the data from the different sources collected in the same format and in the same way? There are a number of current initiatives to improve the standardization of data elements being used in patient registries,30 but the area with the most need for future work is the testing and standardization of the algorithms themselves.

  3. The accuracy of the matching algorithm: What is the likelihood of the algorithm returning a false positive match or missing true matches? While there has been some scientific research validating specific matching algorithms,31-33 the Health Information Technology Policy Committee recently called for increased standards around patient matching, including standardized formats for demographic data fields; internal evaluation of matching accuracy within institutions and projects; accountability to acceptable levels of matching accuracy; the development, promotion, and dissemination of best practices in patient matching; and supporting the role of the patient.34

Another emerging trend in patient matching algorithms is privacy-preserving record linkage, or “finding records that represent the same individual in separate databases without revealing the identity of the individuals.”29 This concept was expanded upon by researchers at University of Duisburg-Essen in Germany, mentioned in the previous section, who propose a method that encrypts patient identifiers while allowing for errors in identifiers. Given the concerns about patient privacy and confidentiality surrounding patient identity management, this method may be increasingly used in the future.

2.2.4. Registries and Patient Matching Algorithms

As mentioned above, patient matching algorithms have become the default PIM strategy for registries that link with outside data sources, due to the lack of a universal UPI in the United States. As a result, many different algorithms have been developed— some commercially available, some open-source, some developed for specific projects, and some developed with broader applications in mind. The performance and effectiveness of matching algorithms can impact the results produced by the registries that are using them. The type of registry also impacts the type of patient-matching algorithm needed. Registries used for direct patient care may require an algorithm with different sensitivity, specificity, and timeliness than those used for population-based research efforts. Registry owners and operators would benefit from standards surrounding patient-matching algorithms, which would allow them to more confidently and effectively use appropriate algorithms for linking projects.

In addition to a universal patient identifier and patient matching algorithms, other strategies are emerging to manage patient identities in disparate electronic health care data sources, including biometrics and master patient indices. In the technical realm of patient-centric document exchange, HIEs are becoming increasingly important in providing the interoperability infrastructure for successful EHR implementations within and across affinity domains.

3.1. Biometrics

One new option in the PIM field is the use of biometrics—that is, “automated methods of recognizing an individual based on measurable biological (anatomical and physiological) and behavioral characteristics.”35 Some examples of biometric measurements are: fingerprint, palm print, hand geometry, DNA, handwriting, finger or hand vascular pattern, iris/retina, facial shape, voice pattern, and gait.

Biometrics are attractive because of their difficulty to fabricate, their resistance to change over time (unlike demographic information such as name and address), and their high degree of uniqueness—making them effectively biological UPIs. For biometrics to be used as UPIs, though, there would need to be agreement on which biometric to use and the format in which it should be collected. Also, some biometric measurements are more unique than others. For example, a fingerprint is highly unique to an individual, while a person's hand geometry is not as unique. Hand geometry therefore is often used to confirm a person's identity (i.e., in combination with another identifier) rather than as a sole identifier.

One drawback to using biometrics is the investment in specialized technology and equipment required to capture many of these measurements. There is also concern about the privacy and security implications surrounding the use of biometrics, connected with their history of use in law enforcement and their potential misuse to derive information other than identity (e.g., analyzing DNA for genetic diseases).36

Some hospitals have begun using biometrics to verify provider identity and restrict access to EHRs. Biometrics are also being used in some hospitals to verify patient identity upon hospital admission37 and identify critically injured, unconscious patients presenting to an emergency room.38

Many registries, particularly those with biobanks associated with them, already collect biometric data (e.g., DNA). However, the data are often used for purposes other than PIM, including investigating genetic components of disease39 and risk factors for disease.40

Biometrics remains an attractive option for PIM; the largest obstacle to its use in patient registries is likely the investment in technology and equipment that it requires, although this would vary depending on where registry data are collected. A multisite, practice-based registry would probably be less able to accommodate the collection of biometric data, while a registry based out of a single hospital that already collects biometric data for other purposes would be able to begin collecting biometrics for a registry more easily, since the initial investment in technology has already been made. Registries using biometrics would also be subject to the same concerns about privacy and security as biometric use in other disciplines.

3.2. Master Patient Index

A master patient index (MPI) facilitates the identification and linkage of patients' clinical information within a particular institution. The term “enterprise master patient index” (EMPI) is sometimes used to distinguish between an index that serves a single institution (i.e., MPI) and one that contains data from multiple institutions (EMPI). MPIs are not themselves patient identity management strategies, but rather informational infrastructures within which those strategies are applied. Most MPIs use a patient matching algorithm to identify matches and then assign a UPI that is associated with that patient record going forward. MPIs and EMPIs are created for the purpose of assigning a UPI to each patient treated within a certain health care system— providers can then use that identifier to have a global view of the patient's care across multiple institutions within that system.

Several leading software companies have released commercially available MPI and EMPI products. Oracle has published a thorough description of the design and functionality of their EMPI product.41 Open-source options are also available, including one developed by Project Kenai called OpenEMPI.42

EMPIs are used as supplemental tools to apply PIM strategies for data sharing efforts such as HIEs, described more fully in the next section. For example, the Michigan Clinical Research Collaboratory at the University of Michigan created the “Honest Broker” system, which serves three functions: facilitating the actual exchange of data between members of the collaboratory for research, maintaining an MPI to manage patient identities within that data, and de-identifying data sets in conformance with HIPAA standards.43

Figure 17–1 is adapted from the Integrating the Healthcare Enterprise integration profile44 and illustrates the actors that participate in the Patient Identifier Cross-referencing profile. The entity often called an MPI is represented by the combination of the Patient Identity Source (“Source”) and the Patient Identity Cross-reference Manager (“Manager”). The Source provides patient identity information (Patient Identity Feed) to the Manager. It is common to have multiple patient identity sources that provide patient ID feeds to the Manager. The Manager is responsible for managing patient identities by detecting matches and creating and maintaining cross-references of patient identifiers across these various sources. The Patient Identifier Cross-reference Consumer (“Consumer”) retrieves Patient Identity Cross References or aliases. This allows patients to be linked across multiple systems or domains that use different patient identifiers to represent the same patient.

What term is used when two patients have both been assigned the same health record number?

Figure 17–1

Basic process flow with patient identifier cross-referencing.

Illustrating how users may interact with an MPI in daily practice may be helpful. In one possible scenario, an emergency room physician sees a patient presenting at the emergency room with vague and poorly defined pain who specifically asks to be prescribed narcotics. A new quality improvement program being implemented in this emergency room requires the physician to check the patient's history of filling prescriptions before issuing a prescription for a narcotic drug. The emergency room's EHR system and the hospital pharmacy's electronic dispensing record system each assign their own patient IDs to patients within their systems, and send patient feeds to the hospital's MPI (the Manager in this scenario) each time a new patient ID is assigned. The MPI creates and maintains cross-references of all identifiers for patients and provides the cross-references to consumers who seek that information. The consumer in this scenario would be the emergency room system, which sends the MPI a patient identity cross-reference or demographic query with information about the patient in question. The MPI notifies the emergency room system that the patient identified in the emergency room as “ER703” matches the patient whose pharmacy records are under the pharmacy system identifier “012.” The emergency room system then queries the pharmacy system for the identifier “012,” and presents the dispensing record data to the emergency room physician.

Health care institutions that use MPIs to manage patient identities across their multiple data sources (e.g., EHRs, pharmacy records, administrative and billing records) are desirable partners for data linkage projects and for inclusion in patient registries, since they are able to draw from a broader pool of data than any one of the data sources alone. By addressing PIM needs upfront, they minimize the work needed for outside sources to link to their data for research uses.

In the relational infrastructure shown in Figure 17–1, registries can act as Patient Identity Sources, Patient Identifier Cross-reference Consumers, or both. Registries that contain patient identifiers and other demographic information can act as Patient Identity Sources and send patient identity feeds to a Manager. Registries can also act as Patient Identifier Cross-reference Consumers, if they request and receive patient identity cross references from an MPI or other Patient Identity Cross-reference Manager. This may be done to add new patients to a registry or to augment existing data in a registry with additional information on the same patients.

3.3. Health Information Exchange

An HIE is an integrated open standards–based solution to enable information sharing across disparate health care applications. (See Case Example 34, which describes the Oakland Southfield Physicians HIE.) HIEs are interoperability platforms that provide the means to share patient data produced by health care applications with other applications that consume and use the data, such as EHRs. HIEs implement standards-based health care messages and provide the requisite authentication and auditory services for data governance. HIEs are not themselves patient identity management strategies, but they implement those strategies to manage their data. Most HIEs achieve this by incorporating an MPI to manage and cross-reference the identity of patients within the HIE. See Figure 17–2 for a graphical representation of the relationship between HIEs, MPI/EMPIs, and data creators and consumers.

What term is used when two patients have both been assigned the same health record number?

Figure 17–2

Data flow through a health information exchange. EHR = Electronic Health Record; PHR = Personal Health Record; EMPI = Enterprise Master Patient Index; HIE = Health Information Exchange; ID = Identifier

Key components of an HIE include:

  • Patient Identity Cross-Reference Manager: An implementation of an MPI that cross-references multiple identifiers and serves the linked identifiers, global patient identifier, and unified patient demographics to information consumers and other HIE components.

  • Document Repository: Clinical document repository for storing patient records and documents.

  • Document Registry: Registry of patient's documents located in various document repositories.

  • Cross Community Gateway: Serves as the entry point for communications between HIE communities.

Content creators who create new patient identifiers provide patient feeds to the Identity Manager, which in turn cross-references it to a global patient identifier. Content consumers and creators can query the Identity Manager for the global identifier by providing a subset of patient demographics or one of their local identifiers. This global identifier is used by the document registry to keep track of patient clinical documents. This infrastructure facilitates an interoperable environment that respects data ownership demands but also provides a complete view of the patient's clinical records from multiple sources.

HIEs can be powerful research tools. A group at the Swansea University School of Medicine has developed the Secure Anonymized Information Linkage (SAIL) databank, containing more than 500 million records from multiple health and social care service providers in the United Kingdom.45 The SAIL databank has already been used to demonstrate the feasibility of identifying potential clinical trial participants at the primary care level, which may be especially useful for disease areas in which recruitment of clinical trial participants is historically difficult (e.g., chronic conditions such as diabetes).46

Because they contain patient data, HIEs are subject to the same privacy and security concerns and regulations as patient registries. A white paper published in April 2011 by the American Health Information Management Association/Healthcare Information Management and Systems Society (AHIMA/HIMSS) HIE Privacy & Security Joint Work Group provides a summary of these considerations.47

A patient registry may contribute data to an HIE, but registries and HIEs are distinct and separate endeavors. Data contained in HIEs are not necessarily collected using observational study methods, as patient registry data are; rather, they are often collected and aggregated by linking to existing databases (which may be, for example, registries, administrative databases, or public health surveillance systems). The purpose of an HIE is not just to evaluate specified outcomes in a defined patient population or even to serve any one predetermined scientific, clinical, or policy purpose, but to provide an aggregated database that can be used for a variety of purposes (which may include identifying patients to recruit for clinical trials, or conducting ecological studies, for example).

4. Major Challenges and Barriers

The process of patient identity management introduces several technical, ethical, and operational challenges, including selecting the appropriate PIM strategy, discussed earlier in this chapter. Additional challenges include the obligation to protect the privacy and security of patient data and the technical interoperability (or lack thereof) of disparate health care data sources.

4.1. Protecting Patient Privacy and Security

One of the most pressing challenges in PIM is addressing the tension between linking patient data in order to manage patients' identities and protecting the privacy and security of those data. This challenge has inherent ethical, regulatory, and technical considerations.

4.1.1. Ethical and Regulatory Considerations

The concepts of protecting patient privacy and security and PIM have always been intertwined. Managing patient identities is essential for protecting the privacy and security of those patients. Conversely, regulations and ethical considerations compel the protection of patients' privacy and security when managing their identities (i.e., it is not enough to know who they are and which information is theirs; one must also protect this information).

Many stakeholders in the health information technology field recognize this relationship. The Health Information Security and Privacy Collaboration counts patient and provider identification as one of its nine domains of privacy and security.48 The Commission on Systemic Interoperability released a report in 2005 in which it recommended that Congress authorize the Department of Health and Human Services to “develop a national standard for determining patient authentication and identity,” and “develop a uniform federal health information privacy standard for the nation, based on HIPAA and pre-empting state privacy laws […].” These recommendations were made simultaneously, “to advance progress of the connectivity of health information technology.”49 Thus, it is widely recognized that PIM and patient privacy and security are closely related, but there continues to be disagreement about how they should relate.

The regulatory framework that guides this discussion in the United States is HIPAA, enacted in 1996. As mentioned previously in this chapter, HIPAA mandated the implementation of a nationwide unique patient identifier, but in 1999 concerns about patient privacy and security prompted the barring of any funding for this endeavor. While HIPAA has not led to the implementation of a standard PIM method, it does set forth a framework for the protection of patient privacy and health information security. This framework is summarized in Table 7–1 in Chapter 7.

In Europe, recently proposed data protection regulations may have a profound impact on the regulatory environment in which registries conduct PIM activities. The directive proposed by the European Commission in January 2012 includes a provision for the “right to be forgotten,” essentially giving individuals the power to remove their personal data from third party data holders at any time they choose.50 If adopted by the European Parliament and European Union member states, the directive will take effect within two years. The implications that this may have for health care research and registries operating in Europe remain to be seen.

4.1.2. Technical Considerations

Data holders employ three main technical methods of ensuring the privacy and security of patient data: anonymization, encryption, and pseudonymization:

  • Anonymization is the practice of removing information that is identifiable to an individual or that may enable an individual's identity to be deduced. This is a viable option in some data use situations (e.g., conducting a research study that does not require patient followup), but not an option in others (e.g., maintaining comprehensive health records for patients in an EHR). It is also not a reversible process—once identifiers are removed from data, they cannot be reinserted.

  • Encryption involves applying a mathematical calculation or algorithm to transform a patient's original data (plain text) into coded data (cypher text). In order to read the cypher text, a user or system must have access to a key that decrypts the data back into plain text. This is an attractive option because it does not involve deleting or removing patient data, and because the coded data is not in a readable format if it falls into the wrong hands. However, encryption requires robust data management policies and resources to be implemented successfully.51

  • Pseudonymization is a more sophisticated approach to patient privacy protection. It involves two steps: depersonalization, in which identifiable data are separated from other clinical data and stored in a separate location, and pseudonymization, in which a unique identifier is generated and applied to the depersonalized data set. The unique identifier, or pseudonym, does not change for a given patient over time, and is not derived from any identifiable attributes of the patient. Pseudonymization can be reversible, if the relationship between the pseudonym and the identifiable data is maintained in a secure way and can facilitate re-identification of the patient under specific circumstances (e.g., a trusted third party maintains the relationship, and only discloses that relationship if the requestor has knowledge of a particular key or password). Pseudonymization can also be irreversible, if a situation arises in which the relationship between the pseudonym and the identifiable data is not maintained, and re-identification is not possible.52, 53

4.2. Interoperability

In the same way that health care enterprises such as hospitals, clinics, and physician offices require patient identifier cross-referencing, that is the linking of patients across different domains, it is necessary to consider how registries may fit within this model and the challenges that level of interoperability may impose. Separate patient registries may use the same PIM infrastructure to register their patient identifiers within a shared patient identifier cross-reference manager, allowing the identifiers to be linked back to relevant health care and related systems. This approach may represent a possible solution whereby registries can more easily and securely be linked to other systems across known domains such as an HIE, but challenges still remain in terms of how this approach could successfully be used more broadly across nonparticipating health care enterprises.

5. Summary

Patient identity management is a fast-growing and evolving field, influenced by emerging technologies, regulations, and opportunities to use electronic health care data. The current status of PIM in the United States is primarily a factor of the provision in HIPAA for “standards providing for a standard unique health identifier for each individual […] for use in the health care system,”11 the debate this provision has generated over implications for patient privacy and security, and the subsequent blocking of any funding being allocated to the pursuit of a national UPI. As a result, most PIM endeavors in the United States (including attempts to link patient registries with other health care data sources) use patient-matching algorithms to identify duplicates and manage patient identities. The lack of standards in this area means that the accuracy and effectiveness of these algorithms can vary widely.

Debate continues around how to best address the challenge of PIM, and stakeholders generally hold one of two views. Some view a national UPI as the best solution, provided the long-standing concerns about protecting patient privacy and security can be adequately addressed in the future. Others believe that resources would be better spent developing and standardizing the PIM methods that have grown organically in the absence of a national UPI; namely, EMPIs and patient matching algorithms. These two endeavors are not necessarily mutually exclusive, and patient registries and data linkage projects would benefit from the advancement of either or both.

Case Examples for Chapter 17

Case Example 40Integrating data from multiple sources with patient ID matching

DescriptionKIDSNET is Rhode Island's computerized registry to track children's use of preventive health services. The program collects data from multiple sources and uses those data to help providers and public health professionals identify children in need of services. The purpose of the program is to ensure that all children in the State receive appropriate preventive care measures in a timely manner.
SponsorState of Rhode Island, Centers for Disease Control and Prevention, and others
Year Started1997
Year EndedOngoing
No. of Sites216 participating practice sites and more than 150 other groups of authorized users
No. of Patients314,211

Challenge

In the 1990s, the Rhode Island Department of Health recognized that its data on children's health were fragmented and program specific. The State had many children's health initiatives, such as programs for hearing assessment and lead poisoning prevention, but these programs collected data separately and did not attempt to link the information. This type of fragmented structure is common in public health agencies, as many programs receive funding to fulfill a specific need but no funding to link that information with other programs. This type of linkage would benefit the Department's activities, as children who are at risk for one health issue are often at risk for other health issues. By integrating the data, the Department would be able to better integrate services and provide better service.

To integrate the data from these multiple sources and to allow new data to be entered directly into the program, the Department implemented the KIDSNET computerized registry. The registry consolidates data from eight electronic data sources, in addition to immunization and online data entry from four more public health programs to provide an overall picture of a child's use of preventive health care services. The sources are newborn developmental risk screening; the immunization registry; lead screening; hearing assessment; the Women, Infants, and Children (WIC) program; home visiting; early intervention; blood spot screening; foster care; birth defects; vital records data; asthma environmental inspection referrals, early child developmental screening, and audiology results. The goals of the registry are to monitor and assure the use of preventive health services, provide decision support for immunization administration, give providers reporting capacity to identify children who are behind in services, and provide recall services and quality assurance.

After being launched in 1997, the registry began accumulating data on children who were born in the State or receiving preventive health care services in the State. Some of the data sources entered data directly into the registry, and some of the data sources sent data from another database to the registry. The registry then consolidated data from these sources into a single patient record for each child by matching the records using simple deterministic logic. As the registry began importing records, the system held some records as questionable matches, since it could not determine if the record was new or a match to an existing record. These records required manual review to resolve the issue, which was time consuming, at approximately 3 minutes per record.

Due to lack of resources to devote to the manual review, the number of records held as questionable matches increased to 48,685 by 2004. The time to resolve these records manually was estimated at 17 months, and the registry did not have the resources to devote to that task.

However, the incomplete data resulting from so many held records made the registry less successful at tracking children's health and less used by providers.

Proposed Solution

To resolve the issue of patient matching, the sponsor implemented an automated solution to the matching problem after evaluating several options, including probabilistic and deterministic matching strategies and commercial and open-source options for matching software. Since the State had limited funds for the project, an open-source product, Febrl, was selected.

A set of rules to process incoming records was developed, and an interface was created for the manual review of questionable records. Using the rules, the software determines the probability of a match for each record. The registry then sets probability thresholds above which a record is considered a certain match and below which a record is considered a new record. All of the records that fall into the middle ground require manual review.

Results

After considerable testing, the new system was launched in spring 2004. Immediately upon implementation, 95 percent of the held records were processed and removed from the holding category, resulting in the addition of approximately 11,000 new patient records to the registry. The new interface for manual review reduced the time to resolve an error from 3 minutes to 40 seconds. With these improvements, the registry now imports 95 percent of the data sent to the database and is able to process the questionable records through the improved interface.

Key Point

Many strategies and products exist to deal with matching patients from multiple data sources. Once a product has been selected, careful consideration must be given to the probability thresholds for establishing a match. Setting the threshold for matches too high may result in an unmanageable burden of manual review. However, setting the threshold too low could affect data quality, as records may be merged inappropriately. A careful balance must be found between resources and data quality in order for matching software to help the registry. In addition, matching quality should be monitored over time, as matching rules and probability thresholds may need to be adjusted if the underlying data quality issues change.

For More Information

Wild EL, Hastings TM, Gubernick R, et al. Key elements for successful integrated health information systems: lessons learned from the states. J Public Health Manag Pract. 2004;(Suppl):S36–S47. [PubMed: 15643357].

Case Example 41Using patient identity management methods to combine health system data

DescriptionThe clinical breast program at Providence Health & Services— Oregon provides screening, diagnosis, and treatment of breast conditions for women in seven hospitals within a regional health care system. The Providence Regional Breast Health Registry integrates patient data from multiple sources to improve patient care and outcomes, conduct research, and collaborate on national quality initiatives.
SponsorProvidence Health & Services— Oregon; Safeway Foundation
Year Started2008
Year EndedOngoing
No. of Sites7 health system hospitals in Oregon
No. of Patients265,130 encounters as of December 2011

Challenge

Leaders of the clinical breast program at Providence Health & Services—Oregon are interested in collecting patient-level data for reporting performance and outcome measures related to health care quality (e.g., biopsy rates); health services (e.g., screening volumes over time); research questions; and accreditation with the National Accreditation Program for Breast Centers (NAPBC). However, patient data reside in numerous information systems, including the hospital electronic health record, administrative billing systems, imaging systems (e.g., mammography, MRI, ultrasound), and the pathology system. The health system uses a patient corporate number (PCN), assigned to each patient in the health system as a patient identifier. Each hospital assigns its own medical record number (MRN) to each patient and a separate encounter number for each visit.

Meeting the reporting and research needs of the breast clinic program requires integrating data from all of these multiple systems as well as managing the identities of patients whose data could be contained in one or all systems.

Proposed Solution

In 2008, the Providence Regional Breast Health Registry was created. Registry data are housed in a structured query language (SQL) database that imports data from the various systems and applies matching algorithms to appropriately group data from the same patient. To make the match, the algorithms take into account the PCN, MRN, and encounter numbers for patients with breast health encounters based on a breast-specific ICD-9 and CPT procedure query. Transformation of data from different systems is sometimes necessary to allow matching (e.g., changing the patient corporate number from 12 to 10 digits).

Results

As of December 2011, the registry contained data on 265,130 patient encounters. It continues to collect and integrate data, and is expanding across the health system to accommodate data from affiliated clinics. Registry data are used to create quarterly updates on quality and outcomes measures identified by program leadership. For the two hospitals in the health system that are NAPBC-accredited, registry data are used to create their required annual reports on outcomes and benchmarks. Registry data have also been used for research purposes, such as identifying factors related to progression from premalignant to invasive lesions.

Key Point

Registries can take advantage of patient identity management solutions to link data from health information systems, regardless of whether a common patient identifier is present. Such linked data provide opportunities for quality improvement, research, and accreditation.

For More Information

http://oregon.providence.org/patients/healthconitionscare/breast-health/Pages/askanexpertland.aspx?TemplateName=Providence+Breast+Health+Care+Registry&Templatetype=FormsandInstructions

Nelson HD, Wang L, Weerasinghe R, et al. Trends and Influences on Mammography Screening in a Community Health System; Poster presented at: Women's Health 2011: The 19th Annual Congress; Washington, DC. April 1-3, 2011..

Soot L, Weerasinghe R, Wang L, et al. Core Needle vs. Surgical Excision Breast Biopsy in a Community-based Health System; Poster presented at: 13th Annual Meeting of the American Society of Breast Surgeons; Phoenix, AZ. May 2–6 2012..

Soot L, Weerasinghe R, Nelson H, et al. How often are High-risk Breast Lesions on Initial Core Biopsy Upgraded after Subsequent Excisional Biopsy?; Poster Presentation at American College of Surgeons meeting; Chicago. October 2012..

References for Chapter 17

1.2.3.4.

Hsiao CJ, Hing E. NCHS Brief No. 111. Hyattsville, MD: National Center for Health Statistics; 2012. [September 30, 2012]. Use and Characteristics of Electronic Health Record Systems Among Office-based Physician Practices: United States, 2001–2012. http://www​.cdc.gov/nchs​/data/databriefs/db111.pdf. [PubMed: 23384787]

5.6.

Integrating the Healthcare Enterprise. [September 30, 2013]. http://www​.ihe.net.

7.8.9.

Hillestad R, Bigelow JH, Chaudhry B, et al. IDENTITY CRISIS: An Examination of the Costs and Benefits of a Unique Patient Identifier for the U.S. Health Care System. RAND Corporation Monograph. 2008 October;(753) [August 17, 2012]; http://www​.rand.org/content​/dam/rand/pubs​/monographs/2008/RAND_MG753.pdf.

10.

American Society for Testing and Materials (ASTM). Standard Guide for Properties of a Universal Healthcare Identifier (UHID). [August 17, 2012]. http://www​.astm.org/Standards/E1714.htm.

11.

Health Insurance Portability and Accountability Act of 1996, Pub. L. No. 104-191 Sec. 1173(b) (August 21, 1996).

12.

National Committee on Vital and Health Statistics (NCVHS); Subcommittee on Standards and Security. Hearing Minutes. Chicago, Il: Jul 20-21, 1998. [August 17, 2012]. http://ncvhs​.hhs.gov/980720mn.htm.

13.

Omnibus Consolidated and Emergency Supplemental Appropriations Act of 1999, Pub. L. No. 105-277 112 Stat. 2681-386.

14.

Greenberg MA, Ridgely M. Patient identifiers and the National Health Information Network: debunking a false front in the privacy wars. Journal of Health and Biomedical Law. 2008;4(1):31–68.

15.

Johnson SB, Whitney G, McAuliffe M, et al. Using global unique identifiers to link autism collections. J Am Med Inform Assoc. 2010 Nov-Dec;17(6):689–95. [PMC free article: PMC3000750] [PubMed: 20962132]

16.

Jacobs JP, Haan CK, Edwards FH, et al. The rationale for incorporation of HIPAA compliant unique patient, surgeon, and hospital identifier fields in the STS database. Ann Thorac Surg. 2008 Sep;86(3):695–8. [PubMed: 18721549]

17.

Ludvigsson JF, Otterblad-Olausson P, Pettersson BU, et al. The Swedish personal identity number: possibilities and pitfalls in healthcare and medical research. Eur J Epidemiol. 2009;24(11):659–67. [PMC free article: PMC2773709] [PubMed: 19504049]

18.

Robertsson O, Dunbar M, Knutson K, et al. Validation of the Swedish Knee Arthroplasty Register: a postal survey regarding 30,376 knees operated on between 1975 and 1995. Acta Orthop Scand. 1999 Oct;70(5):467–72. [PubMed: 10622479]

19.20.

Global Patient Identifiers, Inc. VUHID System. [August 17, 2012]. https://gpii​.info/

21.22.

Li X, Shen C. Linkage of patient records from disparate sources. Stat Methods Med Res. 2013 Feb;22(1):31–8. [PubMed: 21665896]

23.

Campbell KM, Deck D, Krupski A. Record linkage software in the public domain: a comparison of Link Plus, The Link King, and a ‘basic’ deterministic algorithm. Health Informatics J. 2008 Mar;14(1):5–15. [PubMed: 18258671]

24.

Wild EL, Hastings TM, Gubernick R, et al. Key elements for successful integrated health information systems: lessons from the States. J Public Health Manag Pract. 2004 Nov;(Suppl):S36–47. [PubMed: 15643357]

25.26.

Wang HE, Balasubramani GK, Cook LJ, et al. Medical conditions associated with out-of-hospital endotracheal intubation. Prehosp Emerg Care. 2011 Jul-Sep;15(3):338–46. [PMC free article: PMC3103090] [PubMed: 21612386]

27.

Durham E, Xue Y, Kantarcioglu M, et al. Private medical record linkage with approximate matching. AMIA Annu Symp Proc. 2010 Nov 13;2010:182–6. [PMC free article: PMC3041434] [PubMed: 21346965]

28.

Finney JM, Walker AS, Peto TE, et al. An efficient record linkage scheme using graphical analysis for identifier error detection. BMC Med Inform Decis Mak. 2011;11:7. [PMC free article: PMC3039555] [PubMed: 21284874]

29.30.31.

Pacheco AG, Saraceni V, Tuboi SH, et al. Validation of a hierarchical deterministic record-linkage algorithm using data from 2 different cohorts of human immunodeficiency virus-infected persons and mortality databases in Brazil. Am J Epidemiol. 2008 Dec 1;168(11):1326–32. [PMC free article: PMC2638543] [PubMed: 18849301]

32.

Meray N, Reitsma JB, Ravelli AC, et al. Probabilistic record linkage is a valid and transparent tool to combine databases without a patient identification number. J Clin Epidemiol. 2007 Sep;60(9):883–91. [PubMed: 17689804]

33.

Alemi F, Loaiza F, Vang J. Probabilistic master lists: integration of patient records from different databases when unique patient identifier is missing. Health Care Manag Sci. 2007 Feb;10(1):95–104. [PubMed: 17323657]

34.35.36.

Prabhakar S, Pankanti S, Jain A. Biometric recognition: security and privacy concerns. IEEE Security & Privacy. 2003 March/April;1(2):33–42.

37.38.

Marohn D. Biometrics in healthcare. Biometric Technology Today. 2006;14(9):9–11.

39.40.

Wolf EJ, Miller MW, Krueger RF, et al. Posttraumatic stress disorder and the genetic structure of comorbidity. J Abnorm Psychol. 2010 May;119(2):320–30. [PMC free article: PMC3097423] [PubMed: 20455605]

41.42.43.

Boyd AD, Saxman PR, Hunscher DA, et al. The University of Michigan Honest Broker: a Web-based service for clinical and translational research and practice. J Am Med Inform Assoc. 2009 Nov-Dec;16(6):784–91. [PMC free article: PMC3002130] [PubMed: 19717803]

44.45.

Lyons RA, Jones KH, John G, et al. The SAIL databank: linking multiple health and social care datasets. BMC Med Inform Decis Mak. 2009;9:3. [PMC free article: PMC2648953] [PubMed: 19149883]

46.

Brooks CJ, Stephens JW, Price DE, et al. Use of a patient linked data warehouse to facilitate diabetes trial recruitment from primary care. Prim Care Diabetes. 2009 Nov;3(4):245–8. [PubMed: 19604741]

47.48.49.

Commission on Systemic Interoperability. Ending the Document Game: Connecting and Transforming Your Healthcare Through Information Technology. Washington, DC: U.S. Government Printing Office; 2005. [August 17, 2012]. http:​//endingthedocumentgame​.gov/PDFs/entireReport.pdf.

50.51.

Miller AR, Tucker CE. Encryption and the loss of patient data. J Policy Anal Manage. 2011 Summer;30(3):534–56. [PubMed: 21774164]

52.

Noumeir R, Lemay A, Lina JM. Pseudonymization of radiology data for research pu rposes. J Digit Imaging. 2007 Sep;20(3):284–95. [PMC free article: PMC3043895] [PubMed: 17191099]

53.

Neubauer T, Heurix J. A methodology for the pseudonymization of medical data. Int J Med Inform. 2011 Mar;80(3):190–204. [PubMed: 21075676]

a

Privacy and security concerns did not prevent CMS from developing the National Plan & Provider Enumeration System (NPPES) to assign unique identifiers to health plans and health care providers. The National Provider Identifier (NPI) has been implemented since 2006, and a standard identifier has not yet been implemented for health plans. (https://nppes​.cms.hhs​.gov/NPPES/Welcome.do. Accessed June 28, 2012.)