The explosive growth of content in volume, velocity and variety on the Web demands new approaches to content analytics. This is the prerequisite for addressing issues in large-scale analysis and interpretation of heterogeneous data sets, originating in different media, human languages or jurisdiction.
Recently, language- and media-independent data analysis and representation methods such as those provided by Linked Data and Semantic Web technologies, have been introduced to provide innovative content analytics solutions for heterogeneous, multilingual and multimedia content. However, a key missing element is the understanding and interpretation of language in content, both as unstructured textual content and linguistic content present in the context of different media streams. The representation of language- and media-specific linguistic information on a semantic level is needed - accurate analytics operating across the increasing variety of media and human languages used nowadays on the Web.
LIDER will study how language (corpora, dictionaries, lexical and syntactic metadata, etc.) and media resources (image, video, etc.) can serve as an enabler technology for enterprise content analytics on the Multilingual Web, including multilingual content delivered in multiple media.
LIDER will enable consumers and providers of multilingual and multimedia content, multinational enterprises, European public bodies and SMEs, language service providers and other language research and industry stakeholders to develop a shared understanding about the representation of language- and media-specific information on a semantic level.
The long-term roadmap will receive input from a significant number of companies and research organizations. In this way, LIDER will set the ground for reducing the costs of adapting an existing analytics solution to multiple languages and across media boundaries.
LIDER will create
1.A strong community around the topic of LOD-based multimedia and multilingual content analytics.
2.A set of guidelines and best practices for the construction and exploitation of LOD-based resources in multimedia and multilingual content analytics as well as for the development of NLP services on top of Linguistic Linked Data.
3.A Linked Data reference architecture built on top of existing and future platforms and freely available resources.
4.A long-term roadmap for the use of Linked Data for multilingual and multimedia content analytics in enterprises.
The project LIDER is coordinated by Prof. Dr. Asunción Gómez-Pérez from Universidad Politécnica de Madrid (UPM). The MultilingualWeb community led by the World Wide Web Consortium (W3C) serves as the umbrella for LIDER public activities. In this way, LIDER also enlarges the MultilingualWeb community stakeholders, including the stakeholders around the Web of data.
DFKI plays a crucial role in the project in two ways: first, experts from the language technology lab will provide key input for describing application scenarios of content analytics. Second, the close relation of DFKI to W3C and the related industrial community in Germany and beyond will help to generate interest in content analytics applications from a wide range of industry stakeholders. In this way DFKI will assure that LIDER provides industry relevant input to upcoming research opportunities e.g. in the Horizon 2020 funding scheme.
The consortium itself will only be a starting point for LIDER. Contributions from the community will be crucial for the success of the project. There are various ways to provide input or to be informed about LIDER:
- A dedicated W3C community group: Linked Data for Language Technology. The group will consult in public with current and potential users of linguistic data to assemble use cases and requirements for linguistic applications of Linked Data. The results will guide future interoperability, research and development activities, spanning both the domain of language technology and Linked Data.
- A dedicated mailing list lider-community@listas.fi.upm.es. The mailing list will gather anonymous feedback about LIDER, but will also keep the community updated about new developments and events. For dissemination purposes, this list will also include all "Linked Data for Language Technology? community group participants.
- Existing W3C community groups OntoLex and BP-MLOD. The groups inform about the crucial topics in ontology-lexicon models (OntoLex) and best practices for Multilingual Linked Data (BP-MLOD).
- The MultilingualWeb Twitter feed. The feed provides news about general developments around the MultilingualWeb. News related to LIDER have the hashtag #lider-project.
Press Contact
Prof. Dr. Felix Sasaki
Phone: +49 30 23895 1807
fsasaki@w3.org