
Healthcare relies on data analytics for quality improvement and medical breakthroughs. Privacy-enhancing technologies (PETs) are crucial tools to ensure data security. Architectural PETs, including federated learning and secure multi-party computation, preserve data privacy while collaborating. Blockchain, when combined with other PETs, safeguards sensitive health information. These methods offer benefits but face challenges such as transparency and trust. Successful PET integration can revolutionize healthcare analytics, addressing privacy concerns while driving innovation.
Data analytics plays a crucial role in advancing healthcare quality and driving medical breakthroughs. However, ensuring the protection of patient data remains a top priority throughout the entire process.
Privacy-enhancing technologies (PETs) are indispensable tools that healthcare organizations can utilize to ensure data privacy and security. These PETs can be categorized into three groups: algorithmic, architectural, and augmentation. To support healthcare analytics effectively, a combination of these PET categories is highly recommended.
This article represents the second part of a series that aims to break down each category of PET and explore its various applications in healthcare. The previous installment delved deeply into algorithmic PETs.
FEDERATED LEARNING
Architectural PETs, unlike algorithmic PETs, focus on the structure of data or computation environments to safeguard privacy. These technologies facilitate the confidential exchange of information without divulging the actual underlying data.
Federated learning is an approach commonly employed in the development of artificial intelligence (AI) and machine learning (ML) models.
IBM describes federated learning as a technique used to train these models without exposing the data that the model is built upon. In federated learning, multiple individuals contribute their data to collaboratively train a single deep learning model. This process involves iteratively improving the model by each participant. The model updates are then aggregated in a centralized cloud-based datacenter. This collaborative training continues until the model is fully trained.
Research indicates that federated learning applications in biomedical data primarily focus on radiology and oncology. Use cases include applications in brain imaging, COVID-19 diagnostics, tumor detection, cancer biomarker prediction, and the Internet of Healthcare Things (IoHT). There are also proposals for using federated learning to enhance fairness in AI-based screening tools.
In 2018, researchers from the Perelman School of Medicine at the University of Pennsylvania conducted the first real-world application of federated learning in medical imaging data. Their work, published in 2019, demonstrated that a deep-learning model trained through federated learning accurately segmented brain tumor images. This achievement matched 99 percent of the performance achieved when the same model was trained using traditional data-sharing methods. This groundbreaking work paved the way for addressing challenges associated with data acquisition, labeling, and sharing in imaging analytics research.
Additionally, in the same year, researchers at Penn’s Center for Biomedical Image Computing & Analytics (CBICA) received a federal grant of $1.2 million to develop a federated learning framework focused on tumor segmentation. This grant led to collaboration with 29 institutions globally to advance these efforts.
Federated learning offers several potential benefits in healthcare applications. It enhances data privacy, balances accuracy, and utility, facilitates low-cost health data training, and reduces data fragmentation. The approach also supports asynchronous transmissions, fostering collaboration among multiple institutions.
One key advantage of federated learning is its ability to avoid duplicating high-dimensional, storage-intensive medical data for local model training. This scalability feature allows the model to naturally accommodate growing datasets without increasing storage requirements.
Federated learning can be combined with other PETs, such as differential privacy and secure multi-party computation, to fulfill additional data protection needs in medical research. Experts are actively working to benchmark strategies for applying federated learning to biomedical data.
However, deploying federated learning in healthcare analytics does present some notable challenges. The approach requires substantial computational resources and high communication bandwidth.
Transparency is another concern. Ensuring the privacy of training data in federated learning requires the development of systems to test the accuracy, fairness, and potential bias of the model’s outputs. As this technology is still relatively new, such systems are yet to be widely developed and adopted.
Researchers are actively addressing these challenges, as evidenced by recent proposals for innovative architectures to tackle issues related to missing values, data harmonization, and ‘learning task’ schemas in biomedical federated learning efforts.
SECURE MULTI-PARTY COMPUTATION
Secure multi-party computation is an architectural PET that, similar to federated learning, enables parties to collaborate on data computations without revealing their data inputs. This technique is often used in conjunction with federated learning to enhance privacy.
Secure multi-party computation plays a vital role in developing large-scale privacy-preserving applications. It enables a group to collectively perform a computation without disclosing individual private inputs. Participants agree on a function to compute, and a multi-party computation protocol is used to calculate the output of that function based on their confidential inputs.
For successful secure multi-party computation, five processes must be upheld: privacy, correctness, independence of inputs, guaranteed output, and fairness.
Privacy ensures that no party gains access to others’ information, and each party receives only the computed output. Correctness ensures the accuracy of the output. Independence of inputs mandates that each party provides necessary inputs independently, while guaranteed output dictates that all parties should receive the output. Fairness necessitates that each party only receives calculated outputs if all other parties do as well.
Secure multi-party computation finds application in genome research, patient risk stratification, privacy-preserving healthcare data analytics, and tools for biomedical research. The technique’s use in other industries demonstrates its benefits, such as enabling new strategies for technology-based control, reducing the need for inter-organizational trust, and preventing data leakage-related competitive disadvantages.
However, using secure multi-party computation in healthcare comes with limitations. Stakeholders must place trust in the technology, a challenge in a field where clinicians and healthcare professionals may be hesitant to embrace a ‘black box’ tool.
The approach introduces new risks of data misuse. Secure multi-party computation necessitates communication between parties, creating the potential for collusion between two parties to deduce the data of a third party. Preventing this requires robust strategies and protocols like privacy zones.
Privacy zones involve segregating data across multiple domains or servers with distinct privacy restrictions. This framework allows separate parties to engage in secure multi-party computation while safeguarding data by preventing data storage on the same servers or domains.
BLOCKCHAIN
While blockchain is not exclusively a PET, it can function as one when combined with approaches like secure multi-party computation, homomorphic encryption, and zero-knowledge proofs to preserve data privacy.
Blockchain technology, a form of distributed ledger technology, enables stakeholders to record, share, and synchronize information without relying on a central entity. Each transaction on a blockchain is transparent, unchangeable, and permanent.
Transactions are recorded as data ‘blocks,’ each with a unique identifier or hash. The hash changes when the block’s information is modified. Blocks are linked in a ‘chain’ that ensures they cannot be altered or inserted between existing blocks.
Blockchain’s applications in healthcare include addressing IT barriers, leveraging AI for big data analytics, improving electronic health record (EHR) interoperability, enhancing data security, and supporting Internet of Things (IoT) fog computing.
Life sciences organizations have also used blockchain to secure fertility data, particularly relevant as healthcare navigates patient privacy issues and data sharing in the wake of policy changes.
Decentralized architecture networks in blockchain can help break data silos and facilitate data fluidity in healthcare.
Effectively utilizing blockchain in healthcare requires understanding its impact on data security, confidentiality, storage, and availability. However, ensuring blockchain’s compliance with privacy regulations remains a challenge, especially in healthcare.
Research suggests analyzing vulnerabilities within each layer of blockchain architecture to prevent data breaches. Combining blockchain with other PETs can mitigate risks. Nonetheless, stakeholders must develop strong risk mitigation strategies and support the evolution of new PETs.
Overall, data privacy and security are paramount in healthcare analytics, and PETs play a vital role in safeguarding patient data. Architectural PETs like federated learning and secure