Processing comments with AI: data privacy and security

When processing comments using People Insight's AI tool, several steps are undertaken to ensure data privacy and security. Initially, personally identifiable information (PII) is redacted using Microsoft Language Studio before any further processing.

We do also offer redaction by a person rather than AI if that's your preference. It is more costly, but there may be valid reasons that you might preference a human to redact any PII or sensitive infomation.

After redaction, comments are categorised and analysed for sentiment. The processed data is securely stored, and when Prism is enabled, it provides summaries and action recommendations while maintaining confidentiality throughout the process. This ensures that sensitive information is protected while delivering actionable insights.

Steps in the AI-assisted comment handling process

When we enable the AI-assisted handling of comments in surveys, here's a step-by-step overview of what happens to the data:

PII redaction: Any comments that are submitted are first sent to Microsoft Language Studio without associated respondent or company information. Microsoft Language Studio performs personally identifiable information (PII) redaction to ensure that all sensitive data is removed.
Profanity redaction: Those comments also have profanities removed from them in a separate step in Microsoft Language Studio.
Comment categorisation: After PII redaction, Open AI categorises the comments using our defined categories.
Sentiment analysis: Open AI can also apply sentiment analysis to the comments using their standard model, if the question has this enabled. This analysis helps us understand the overall tone and sentiment of the feedback.
Data storage: Once the comments have been processed by OpenAI, the enhanced data is sent back to us. We securely store this data in our database. It’s important to note that OpenAI does not retain any records of the data, nor is the data used to train their models. All data storage is compliant with our data retention policy.
Prism analysis (if enabled): For deeper insights, the enhanced data is then sent back to OpenAI. Using People Insight's engineered prompts, OpenAI summarises the comments by category, which simplifies the analysis, especially when dealing with a large volume of comments. Additionally, OpenAI provides recommended actions based on the summarised data.
Prism related data privacy and security: OpenAI Enterprise processes the data without any context of the source company, and the comments are already PII-redacted before they are sent. OpenAI does not use this data to train its models. As your data processor, we control the use of this data to ensure your privacy is protected. For more information on how OpenAI handles data, you can refer to their enterprise privacy documentation.

Access to information post-AI processing

Once comments are processed through People Insight's AI tool, access to the information is restricted to authorised users within the organisation. No external parties can view or use the data, and none of the data is used to train AI models. Our system is designed with privacy and confidentiality in mind, ensuring that personal data is protected and remains under client control.

Data protection and risks

We prioritise data protection by redacting PII before any AI processing. Our systems ensure that no data is used for model training, and access is restricted to authorised users only. Risks are mitigated through strict privacy protocols, encryption, and ethical AI usage, ensuring data confidentiality. While digital systems always carry some risk of data breaches or improper access, we implement robust measures to minimise these risks, and the redacted and anonymised nature of the data reduces the likelihood of any significant privacy concerns.

Reliability and quality of AI data redaction and analysis

Microsoft Language Studio’s PII redaction is reliable for identifying and removing sensitive data like names, emails, and National Insurance numbers. It supports multiple languages and offers high accuracy, particularly in English. We have customised it to fit our specific industry needs, improving accuracy for domain-specific terminology. While the system is generally effective, occasional false positives or negatives may occur, especially in context-dependent text. However, it complies with major privacy regulations like GDPR and HIPAA, ensuring the redaction process is secure, with data encryption in place to protect sensitive information during processing.

Predefined list for PII redaction

Microsoft Language Studio provides a predefined list of PII categories for redaction, including:

Names (full name, first name, surname)
Email addresses
Telephone numbers (mobile and landline)
National Insurance numbers
Credit card numbers
Bank account details
Driving licence numbers
Passport numbers
IP addresses
Postal addresses (street, town, postcode)
National identification numbers
Medical data (patient IDs)
Date of birth
Financial information (taxpayer IDs, financial accounts)
Device IDs (UUIDs, IMEI numbers)
Vehicle registration numbers

These categories ensure compliance with UK and global privacy regulations, and the tool allows for customisation to detect additional PII types.

Manual sensitivity flagging

While PII redaction does not cover manual sensitivity flagging, Prism, our Gen AI tool, can identify highly sensitive or disclosive information, such as sexual orientation, disability, religious beliefs, or specific personal situations like grievance cases. By using detailed prompts, Prism is able to effectively identify and flag these categories, extending beyond the standard PII types to ensure sensitive information is managed appropriately.

Explaining in more detail about how we categorise comments

In our platform, we have developed a model using OpenAI to categorise comments from employee surveys into specific categories. The model was trained using anonymised training data, ensuring all personally identifiable information (PII) was removed to maintain data privacy. The categories span a range of topics including "agility and innovation," "career progression," "equality, diversity, and inclusion (EDI)," and "flexible and hybrid working," among others. This allows the model to capture diverse themes, from operational topics like "customer service and quality" to cultural aspects such as "employee voice" and "leadership."

To develop this categorisation model, we manually tagged a large dataset of anonymised comments with the appropriate categories. The model was trained using this labelled data, and its performance was tested using a validation dataset to ensure accuracy. We measured its effectiveness and recall (how well the model identifies relevant cases). This iterative approach enabled us to continually fine-tune the model for optimal performance, ensuring that it can reliably assign comments to the correct categories.

The model is now able to efficiently categorise comments into the following categories: agility and innovation, autonomy and empowerment, career progression, change management, cross function communication, customer service and quality, don't know or unsure, employee voice, environmental, social, and governance (ESG), equality, diversity, and inclusion (EDI), flexible and hybrid working, general communication, health and safety, job security, leadership, learning and development, line manager effectiveness, meetings, new joiners onboarding and induction, and no comment. These predefined categories enable deep analysis of qualitative feedback, providing users with actionable insights into key organisational themes.

If you have any further questions, please don't hesitate to contact our support team.

Dashboard Reports: AI enabled comment analysis

Using Prism to analyse your Comments

How we use Prism (AI) for analysis: bias, transparency, and ethical considerations

How we ensure anonymity in surveys and protect data

Terms and Conditions for the use of Prism

How we process Comments data in surveys