What is PII?
PII (Personally Identifiable Information) is any information that someone else can use to try and impersonate you or impact your life without your consent. Your personal PII data can be captured through interactions with companies, healthcare providers, banks, or anywhere that your identity needs to be verified. It is important that we, as a data community, not only think about protecting our own PII, but the PII of our customers, clients, and colleagues. It is not just about protecting your company from reputational, legal, or compliance risks; it is also about protecting each of us as individuals.
Protecting PII Data
By now, most of us understand that there are an overwhelming amount of data leaks each year – from all different industries. It may just be an email address from that company over there, credentials from this company over here, and the cell phone or computer information from yet another company you did not even know had your information. Because we are in a time where we are trying to learn so much about prospective customers by using AI/ML modeling, the ease with which these programs can scrub the internet for all these leaks and form a little personal portfolio, would scare most people. Articles for light reading on the subject will be presented at the bottom.
There are several challenges with safeguarding personal information that fundamentally start with the need and requirement to positively identify a person to prevent impersonation. How meta is that? Systems thinking tells us that sometimes we try to plug a hole in one place and the pressure sprays out through another hole. The general rule of alleviating the pressure is to reduce the volume, so let us get into that.
The common data elements that most companies focus on are:
- Common PII – Full Name, Date of Birth, Address, Email Address, Phone Number and Social Security Number, Account Number, Passwords, IP address, Device ID, Biometrics, Full- face Photographs, Driver’s License Number
- PCI (Payment Card Information) – Card Holder’s Name, Credit Card Number, CVV
- PHI (Protected Health Information) – Demographic Information, Medical History, Test and Lab Results, Menal Health Conditions, Insurance Information, Genome Sequencing, Genetic Markers, Ancestry
Reducing Risk
The first step is ALWAYS going to be to tokenize, obscure, remove, information from the general data community within your company. This can be done through processes, tools, permissions, or architecture. There are MANY articles and best practices out there already. The plan that I recommend in most scenarios is:
1. Get rid of all the information you do not need. Not only does that reduce financial and customer exposure risk, but it typically will also reduce data processing overhead and cost.
2. Replace it with more appropriate data. For example, if you are segmenting on age or generational preferences, calculate the birthdate to an age inflight before committing the data to a hard disk or cloud account.
3. Keep the raw data as close to the source as possible. There is little reason to use most forms of PII within reporting and analytics layers. Especially if the focus is with unbiased interpretations of the data behavior.
4. Collect as little sensitive data as possible. If it is not there, it is not at risk.
5. Use AI (Artificial Intelligence) to identify areas where AI is biased. Recursively checking to see if patterns are self-fulfilling, or if there is room to challenge the models training up to this point. AI personal growth. AI is not just for targeting revenue opportunities but can also be used to target reputation opportunities.
6. Ask yourself and collogues questions during design:
- If we all hate the onslaught of marketing emails and ads, why do we continue to participate in sending them out?
- If the only information an AI model needs is spending habits and the location of the store to be able to pinpoint an individual, are we still protecting them by not providing the model with their name?
- Will the data that I am collecting really provide enough advantage to outweigh the cost of risk to the customer?
- Is there an obligation to let the individual know how we are using their data and if they consent to being targeted in a certain way? How would that simple action change things?
Next Steps
- Keep your silos and domains, but clearly define expectations of data quality and hold each other accountable to deliver.
- Certain data management functions should start with data literacy.
- Not all data is governed the same, but if it is going to be presented or consumed by an executive, it had better be certified!
Change the focus from governing everything to just enough governance to reduce risk and confusion. Build bridges and pathways between silos as needed to optimize revenue and let the self-organization of the data environment become an organic advantage. If this is intriguing, think of the journey to data meshing!
Related Articles
CIOReview Awards Curate Insights Most Promising Insight Engines 2023
The annual listing of 10 companies that are at the forefront of providing Insight Engines solutions and transforming businesses.
What is a Data Silo?
Data silos can obstruct data consumption unless there are clearly defined expectations of data quality and we hold each other accountable.
What is Data Privacy?
Data privacy continues to grow in importance, especially in a world that is nearly entirely digital. Learn the best practices and legal regulations to protect data privacy.