Publikation

Developing Annotation Guidelines for CSAM Prevention Interventions: Psychosocial Risk and Protective Factors Grounded in Research and Clinical Practice

Vera Czehmann; Christine Hovhannisyan; Lena Hoffmann; Paula Busch; Ibrahim Baroud; Sebastian Möller; Roland Roller; Hannes Gieseler; Lisa Raithel

In: Dimitrios Kokkinakis; Charalambos Themistocleous; Gaël Dias; Kathleen C. Fraser; Sebastião Pais; Fredrik Öhman (Hrsg.). The Sixth Workshop on Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric/developmental impairments (RaPID@MENTAL.ai) @ LREC 2026 - Workshop Proceedings. Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric/developmental impairments (RaPID-6), located at LREC-2026, May 12, Palma, Mallorca, Spain, Pages 126-145, ISBN 978-2-493814-59-3, European Language Resources Association (ELRA), 2026.

Zusammenfassung

This work discusses sexual offending, specifically child sexual abuse material (CSAM), in the context of prevention. We introduce a domain-specific, span-level annotation scheme and guidelines to identify psychosocial risk and protective factors in therapist-led, anonymous chat interventions with voluntarily help-seeking individuals concerned about their pedophilic interests and the risk of CSAM use. The scheme is grounded in previous research and clinical experience, and intended for within-intervention guidance and longitudinal tracking, rather than actuarial risk scoring. Annotating a pilot subset (8 clients, 31 sessions), inter-annotator agreement was moderate but improved after calibration, which is consistent with the linguistic and clinical ambivalence present in the data. We track a session-wise Protective Ratio, i.e., the share of protective factors among all coded factors, and examine its behaviour over time during the intervention and around self-reported relapse within clients. In exploratory automation, LLM-based span extraction outperforms BERT baselines but overall performance remains limited by small data and mixed-evidence spans. While complete anonymisation of the corpus is in progress, we release the label scheme, guidelines, and non-sensitive artefacts of our analyses.

Projekte

VERANDA - Vertrauenswürdige Anonymisierung sensibler Patientdaten für Fernkonsultationen

Weitere Links

http://lrec-conf.org/proceedings/lrec2026/workshops/rapid6mentalai/2026.rapid6mentalai-1.0.pdf

2026.rapid6mentalai-1.0-CSAM_guidelines.pdf (pdf, 1 MB )