Looking for the API reference?

Introduction

Aftercare evaluates the quality of your responses in the context of the response itself and compared to all other survey responses. Not all quality checks are built the same, so we’ve built a few different metrics to give you a holistic view of your responses.

The qualityScore field is deprecated and will be removed in a future release. Please use demeritScore instead. The demeritScore provides the same information but with a more intuitive scale - higher scores indicate lower quality responses.

Components

Aftercare AI breaks down the quality of your responses into a few different metrics:

Respondent level metrics

  • Inconsistency: how inconsistent a respondent is across multiple responses. Higher scores indicate more contradictions and inconsistencies in the responses.

Response level metrics

  • Nonsense: how coherent and logical the response itself is. (ie, gibberish, troll responses, etc.)
  • Relevance: how pertinent the response is to the question.
  • Low-effort: how much effort was put into the response. (length, specific details, etc.)
  • Llm-generated: how likely the response is to be AI or LLM generated.
  • Self-duplication: how similar a particular answer is to previous answers in the same survey response.
  • Shared-duplication: how similar the response is to other responses for a particular question across respondents. Duplicates can be found in the data quality console in the platform.
    Shared duplicate scores will only be calculated if a response identifier is provided. See Shared duplicate detection for more information.
  • Keystroke-issues: a higher score indicates suspicious non-human keystroke patterns. See Keystroke event collection for more information.
  • Honeypot: a higher score indicates that the respondant has likely fallen for the honeypot phrase. See Honeypot detection for more information.

Aftercare will generate a confidence score for each of these metrics along with an overall demerit score from 0-100 you can use to compare the quality of your responses. The higher the demerit score, the lower the quality of the data. Aftercare will let you know which (if any) of these quality metrics were violated and how many violations were found in total.

Quality Detection Modes

In addition to checking for specific quality issues, Aftercare provides three detection modes that group related issues together for common use cases:

Responsiveness

Focuses on identifying poor-quality responses by checking for issues like nonsensical content, irrelevant answers, and low-effort responses.

Use this mode when you want to quickly identify respondents who aren’t engaging meaningfully with your survey.

Authenticity

Concentrates on detecting potentially fraudulent responses by checking for issues like LLM-generated content and duplicated answers across respondents.

Use this mode when validating that responses are genuine and not artificially generated.

Composite

Performs a comprehensive evaluation by checking for all available quality issues. This is the most thorough analysis and is the default mode.

Use this mode when you want a comprehensive check of all quality factors across your dataset.

If you specify both a detection mode and a list of specific quality issues in your request, Aftercare will prioritize the detection mode.

Keystroke event collection

Collect real-time keystroke events from a respondent’s browser and use them to detect bot activity. Aftercare will compute the following metrics:

  • total keystrokes: the total number of keystrokes in the response
  • average hold time: the average time a key is held down for
  • hold time standard deviation: the standard deviation of the hold time of each key
  • average digraph delay: the average time between digraphs (two-character combinations)
  • digraph delay standard deviation: the standard deviation of the time between digraphs
  • digraph delay p90: the 90th percentile of the time between digraphs
  • pause ratio: the ratio of pause to keystroke events

From these metrics, Aftercare will compute a keystroke-issues score from 0-100. See these metrics in the data quality console in the platform.

To get started, see the Events API documentation.

Shared duplicate detection

Shared duplicate detection compares responses across respondents to detect copy pasted responses. Longer responses are weighted more heavily than shorter responses.

Shared duplicates will only compare across responses that have a response identifier. If your respondants are anonymous, you can generate a random uuid for each response and pass it in the responseIdentifier field. It helps to prepend random_ to the response identifier to make it easier to identify.

Already generated responses and want to retroactively add response identifiers? Reach out to support@getaftercare.com and we’ll be happy to help. We’re adding a feature to allow you to add response identifiers to existing responses in the platform soon.

Honeypot detection

With the increasing sophistication of LLM powered bots, it is becoming more difficult to detect them using simple heuristics.

A honeypot is a hidden phrase that an LLM reading the page will see but a human will not. We can use a honeypot phrase to poison the LLMs context and detect if a respondent has fallen for the honeypot phrase.

Pass a honeypot phrase to the data quality API and Aftercare will detect if a respondent has fallen for the honeypot phrase. See the Data Quality API documentation for more information.

Inconsistency detection

Inconsistency scores are computed by comparing a respondent’s multiple responses to each other. The API response will include the inconsistency score and whether it is flagged or not.

To see specific responses that are flagged as inconsistent and the specific rationales, see the Data Quality Console in the platform.

Tips for improving your response quality

While you can use the data quality API to evaluate each response individually, providing a survey ID, question ID, and response ID will help tie together responses across respondents. Doing so will provide a more accurate assessment of the quality of your responses.