Skip to main content Skip to main navigation

Publication

Deep Learning-based Text Mining for Technology Monitoring in the Automotive Domain

Jan-Tilman Seipp; David Reuschenberg; Eren Özveren; Matthias Meyer; Felix Köhler; Leonhard Hennig; David Harbecke; Phuc Tran Truong
In: Proceedings of GTM 2024. Global TechMining Conference, September 16-17, Berlin, Germany, Global TechMining Conference, 2024.

Abstract

The "Text2Tech" research project aims to automate the extraction of technology-related data and its associations from unstructured text sources, such as patents, research papers, and industry news, using Natural Language Processing (NLP). Specifically, the project focuses on Named Entity Recognition (NER) and Relation Extraction (RE), fundamental tasks in NLP, to identify and understand relations between extracted entities. This study assesses the efficacy of Large Language Models (LLMs) for these tasks within the automotive manufacturing context, which is typically characterized by limited training data. We explored two main approaches: a prompt-based method using zero-shot and few-shot prompting strategies with models like GPT-3.5 and BART, and a fine-tuning approach that iteratively adjusts models based on semi-automatically labeled data. Our initial results indicate that while prompt-based methods are useful, fine-tuning significantly enhances model performance by tailoring it to specific domain needs. Preliminary results demonstrate the challenges of adapting LLMs to domain-specific NER and RE, highlighting differences in performance between models and the nuanced nature of relationship extraction. For instance, BART-large trained with the REBEL approach shows promising early results compared to other models. Despite these advancements, the complexity of accurately capturing and categorizing relationships in automotive manufacturing documents remains high, as reflected by low inter-annotator agreement in manual labeling. Future work will include refining models and expanding the dataset for improved entity linking and relationship extraction accuracy.

Projects