In the news-polygraph project, we are developing digital methods for the media industry, with the help of which multimedia content can be checked more easily and quickly for manipulation, de-contextualized or misleading content using hybrid AI technology.
Motivation and project goals Goal of the news-polygraph alliance is to provide the media industry with suitable technology for early detection of fake content and false information. For this purpose, an interactive platform is being developed to support the various verification tasks of journalists. To this end, AI tools are being developed that can check texts, images, audio files and videos for different characteristics that indicate disinformation, to simplify the work of journalists.
In addition, transparency criteria, explainability of the AI tools and the prevention of bias will bias will play an important role in making AI tools reliable, comprehensible and trustable. Processes will also be developed to integrate crowdworkers as "naïve", "trained" and "experts" collaborators in the false news detection process with journalistic experts, to bridge quality gaps of existing AI models and facilitate/ accelerate the journalistic process of fake news detection. Eventually, an AI-based platform will be developed that will help journalists and media professionals to work intuitively, transparent and, above all, in a time-saving way.
In the social media, deliberately placed false information has long been a well-known and widespread problem in social media. But not only the number of manipulated media is increasing rapidly, but the quality of the fake content is increasing. So-called deepfakes, image, audio or video files manipulated by AI, increasingly appear deceptively real, so that fakes can currently only be identified with great effort by fact-checking experts.
The news-polygraph alliance consists of a total of ten partners from industry and science. The entrepreneurial orientation of the alliance is represented by five companies, Crowdee, delphai, neurocat, transfermedia and Ubermetrics. In conjunction with the regional partners from the application area, Deutsche Welle (DW) and Rundfunk Berlin-Brandenburg (RBB), the alliance has practice partners where the AI tools and platform can be tested and integrated into everyday journalism. Any AI-based and/or scientific research and development is lead by DFKI, TU Berlin and Fraunhofer IDMT as academic partners.
Within the framework of the project, 2 pilot applications are being evaluated: (1) "Fast Fact-Checking" with the practice partner Rundfunk Berlin-Brandenburg (RBB), in which we focus on the processes of daily (and timely) news evaluation; and (2) "Deep Fact-Checking" with our partner Deutsche Welle (DW), in which we focus on the processes of in-depth investigative processes that may require more time.
The SLT contributes significant competences in the areas of:
- Fact-Checking, i.e. the extraction (claim extraction), modelling modelling and the comparison of relevant statements from a text with the help of knowledge databases
- Factuality and Checkworthiness, i.e. the evaluation of content statements with regard to relevance and verifiability
- Content Verification, i.e. the verification of a (multimedia) content
- Provenance analysis, i.e. the analysis of the origin / source and dissemination of fake news in (social) networks incl. analysis of the source trustworthiness.
- Fake news detection of audio data, i.e. authentication / forensic search for manipulation artefacts as well as deep fake detection of synthetically generated data, e.g. through Voice Conversion (VC), Speech-to-Text (STT), voice cloning, zero-shot learning methods (VALL-E)
- Fake news detection of image and video data, i.e. authentication / forensic search for manipulation artefacts as well as deep-fake detection of synthetically generated data, e.g. by image or video or video manipulation software or data generated synthetically by AI, e.g. by image-to-image translation, diffusion-, GAN-, or VAE-based models.
- Fake news detection of text data, i.e. authentication / forensic search for manipulation artefacts as well as deep-fake detection of synthetically generated data, e.g. by chatGPT, GPTx, PaLM, LLaMA, LaMDA, BARD, mT5, Gopher, Ernie, OPT-IML, Megatron etc.
- Speech recognition, e.g. Automatic Speech Recognition (ASR), Multi-Lingual Speech Recognition*
- Speaker recognition, e.g. Automatic Speaker Recognition and Verification (ASV), Multi-Lingual Speaker Recognition.
- Recognition of emotions and characteristics from speech, text, video/images, multimodal, e.g. transformer-based models, acoustic, linguistic (language models), and visual models (face recognition, facial models (face recognition, facial Expression, Landmarks).
- Multi-modal fake news detection, *e.g. combinations of text- and image-based information in social media posts, or speech- and video-based information from multimedia data
- Crowd-based AI support, *e.g. automated online crowd and expert sourcing, hybrid (AI+Human) process automation for obtaining high quality AI training data.
- Crowd-based support for journalists and media professionals, e.g. automated online crowd- and expert-sourcing to fact-check communities, and their combination with AI intelligence for integration into journalistic processes*.
- Explainability, for example, factors for gaining trust through analysis of, e.g. AI transparency, bias, fairness, robustness incl. visualisation and feedback generation (Natural Language Feedback) to participating crowds and journalists.
- AI in the field of LLMs, Transfer Learning, Cross-Lingual Learning, Continuous Learning, Frugal AI, RLHF,
Lead: Dr. Tim Polzehl Dr. Tim Polzehl leads the AI-based developments in the area of speech-based applications of the Speech and Language Technology department at DFKI. In addition, he leads the area of "Next Generation Crowdsourcing and Open Data" and is an active member of the "Speech Technolgy" group of the Quality and Usability Labs (QU-Labs) at the *Technical University of Berlin, and is engaged in the evaluation, development and improvement of large language models (LLMs).
Profile DFKI: https://www-live.dfki.de/web/ueber-uns/mitarbeiter/person/tipo02
Profile QU-Labs TU-Berlin: https://www.tu.berlin/index.php?id=29499/
Contact:tim.polzehl@dfki.de