Processing comments with AI: data privacy and security
When processing comments using People Insight's AI tool, several steps are undertaken to ensure data privacy and security. Initially, personally identifiable information (PII) is redacted using Microsoft Language Studio before any further processing. After redaction, comments are categorised and analysed for sentiment. The processed data is securely stored, and when Prism is enabled, it provides summaries and action recommendations while maintaining confidentiality throughout the process. This ensures that sensitive information is protected while delivering actionable insights.
Steps in the AI-assisted comment handling process
When we enable the AI-assisted handling of comments in surveys, here's a step-by-step overview of what happens to the data:
PII redaction: Any comments that are submitted are first sent to Microsoft Language Studio without associated respondent or company information. Microsoft Language Studio performs personally identifiable information (PII) redaction to ensure that all sensitive data is removed.
Comment categorisation: After PII redaction, Microsoft Language Studio categorises the comments using our AI model.
Sentiment analysis: Microsoft Language Studio also applies sentiment analysis to the comments using their standard model. This analysis helps us understand the overall tone and sentiment of the feedback.
Data storage: Once the comments have been processed by Microsoft Language Studio, the enhanced data is sent back to us. We securely store this data in our database. It’s important to note that Microsoft does not retain any records of the data, nor is the data used to train their models. All data storage is compliant with our data retention policy.
Prism analysis (if enabled): For deeper insights, the enhanced data is then sent to OpenAI. Using People Insight's engineered prompts, OpenAI summarises the comments by category, which simplifies the analysis, especially when dealing with a large volume of comments. Additionally, OpenAI provides recommended actions based on the summarised data.
Prism related data privacy and security: OpenAI Enterprise processes the data without any context of the source company, and the comments are already PII-redacted before they are sent. OpenAI does not use this data to train its models. As your data processor, we control the use of this data to ensure your privacy is protected. For more information on how OpenAI handles data, you can refer to their enterprise privacy documentation.
Access to information post-AI processing
Once comments are processed through People Insight's AI tool, access to the information is restricted to authorised users within the organisation. No external parties can view or use the data, and none of the data is used to train AI models. Our system is designed with privacy and confidentiality in mind, ensuring that personal data is protected and remains under client control.
Data protection and risks
We prioritise data protection by redacting PII before any AI processing. Our systems ensure that no data is used for model training, and access is restricted to authorised users only. Risks are mitigated through strict privacy protocols, encryption, and ethical AI usage, ensuring data confidentiality. While digital systems always carry some risk of data breaches or improper access, we implement robust measures to minimise these risks, and the redacted and anonymised nature of the data reduces the likelihood of any significant privacy concerns.
Reliability and quality of AI data redaction and analysis
Microsoft Language Studio’s PII redaction is reliable for identifying and removing sensitive data like names, emails, and National Insurance numbers. It supports multiple languages and offers high accuracy, particularly in English. We have customised it to fit our specific industry needs, improving accuracy for domain-specific terminology. While the system is generally effective, occasional false positives or negatives may occur, especially in context-dependent text. However, it complies with major privacy regulations like GDPR and HIPAA, ensuring the redaction process is secure, with data encryption in place to protect sensitive information during processing.
Predefined list for PII redaction
Microsoft Language Studio provides a predefined list of PII categories for redaction, including:
Names (full name, first name, surname)
Email addresses
Telephone numbers (mobile and landline)
National Insurance numbers
Credit card numbers
Bank account details
Driving licence numbers
Passport numbers
IP addresses
Postal addresses (street, town, postcode)
National identification numbers
Medical data (patient IDs)
Date of birth
Financial information (taxpayer IDs, financial accounts)
Device IDs (UUIDs, IMEI numbers)
Vehicle registration numbers
These categories ensure compliance with UK and global privacy regulations, and the tool allows for customisation to detect additional PII types.
Manual sensitivity flagging
While PII redaction does not cover manual sensitivity flagging, Prism, our Gen AI tool, can identify highly sensitive or disclosive information, such as sexual orientation, disability, religious beliefs, or specific personal situations like grievance cases. By using detailed prompts, Prism is able to effectively identify and flag these categories, extending beyond the standard PII types to ensure sensitive information is managed appropriately.
Explaining in more detail about how we categorise comments
In our platform, we have developed a machine learning model using Microsoft Language Studio to categorise comments from employee surveys into specific categories. The model was trained using anonymised training data, ensuring all personally identifiable information (PII) was removed to maintain data privacy. The categories span a range of topics including "agility and innovation," "career progression," "equality, diversity, and inclusion (EDI)," and "flexible and hybrid working," among others. This allows the model to capture diverse themes, from operational topics like "customer service and quality" to cultural aspects such as "employee voice" and "leadership."
To develop this categorisation model, we manually tagged a large dataset of anonymised comments with the appropriate categories. The model was trained using this labelled data, and its performance was tested using a validation dataset to ensure accuracy. We measured its effectiveness through the F1 score, a metric that balances precision (the accuracy of the predictions) and recall (how well the model identifies relevant cases). This iterative approach enabled us to continually fine-tune the model for optimal performance, ensuring that it can reliably assign comments to the correct categories.
The model is now able to efficiently categorise comments into the following categories: agility and innovation, autonomy and empowerment, career progression, change management, cross function communication, customer service and quality, don't know or unsure, employee voice, environmental, social, and governance (ESG), equality, diversity, and inclusion (EDI), flexible and hybrid working, general communication, health and safety, job security, leadership, learning and development, line manager effectiveness, meetings, new joiners onboarding and induction, and no comment. These predefined categories enable deep analysis of qualitative feedback, providing users with actionable insights into key organisational themes.
If you have any further questions, please don't hesitate to contact our support team.