BM R+D / Voice and Language




We research on natural language processing, both written and oral, in order to come up with tools for the automated processing of linguistic content in multilingual environments or where human language becomes the preferred form of interaction. The technologies we develop enable:

  • Mass analysis of texts in order to extract opinions, feelings and data from sets of texts, for the purposes of generating user-profiling systems and hybrid recommendations in addition to grouping and categorising textual content.
  • Proofreading and style guides, both for native speakers and for those learning a second language.
  • Development of standardisation systems for texts, filtering/moderation of content and the automatic generation of content and summaries..
  • Automatic translation between two languages and retrieving cross-language information.
  • Synthesis of a bilingual Catalan-Spanish voice with natural levels of expressiveness, based on the Cereproc© synthesis motor.
  • Processing of sign language and the development of applications with integrated signing avatars.


Natural language processing

We research, develop and innovate robust, portable technologies in the field of natural language processing; specifically, we focus on semantic annotation, named-entity recognition and classification (NERC), language modelling, semantic analysis, grouping and classification and factuality analysis.

These technologies study, model and characterise texts, via linguistic as well as statistical approximation. The former is based on an understanding of language through rules, dictionaries, ontology and the like; i.e. understanding the dependencies and relationships between words. The latter, in contrast, infer knowledge by learning through examples. A hybrid approximation combines the advantages of both approaches, enabling us to “understand” automatically or semi-automatically what a set of texts is saying, who is saying it and how they are saying it. In other words, structured information can be extracted from texts containing unstructured information.

Specifically, our research into natural language processing is mainly focused on:

  • Semantic annotation – Named entity recognition and classification (NERC)
  • Language modelling
  • Semantic analysis
  • Grouping and classification
  • Factuality analysis

Linguistic technologies are highly dependent on the language and type of writing. Currently the research team is examining Catalan, Spanish and English, and is also looking at formal writing (from news articles or blogs), user-generated content (reviews and limited texts such as those from Facebook and Twitter) and automatic transcriptions. Additionally, the team is studying how information is treated in more than one language.

Prosody for voice synthesis

We are working on the automation of the voice-creation process and the adaptation of these technologies to specific fields. Consequently, our research efforts are mainly concentrated on developing models of phonetic and prosodic language, models that improve the natural qualities of synthetic voices, and models that enable the generation of synthetic voices with emotion, in addition to rule-based linguistic processing and the generation of dictionaries and vocabularies.

The Team

Barcelona Media’s Voice and Language section is made up of a team of researchers who collectively cover all the different areas of specialisation in this field of R&D.

Director

Toni Badia [+]

Technical and Marketing Manager

David Comas [+]

Team Members

Joan Codina [+]
Judith Domingo [+]
David García Narbona [+]
Bernat Grau [+]
Jens Grivolla [+]
Patrick Lambert [+]
Maria Teresa Melero [+]
Guillem Massó [+]
Carlos Rodríguez [+]
Marta Ruiz [+]
Roser Sauri [+]
Teresa Suñol [+]

Collaborators

Martí Quixal [+]

Projects

  • Social Media

    The objective in this area is to exploit the latest social phenomenon provided by the Internet: the publication of information and opinions by online users and their growing participation in social networks.
    Website Social Media

  • T4ME

    Strategic alliance for the creation of the technologies and applications required to ensure the sustainability of linguistic diversity and multiculturalism in European societies; these include automatic learning, social IT, cognitive systems, knowledge technologies and multimedia content.
    Website T4ME

  • ICE3

    The objective of this project is to promote computer-aided language learning at school, based on a pedagogical focus that incorporates processing tools for immediate response generation.
    Website ICE3

  • Emaps

    The aim of this project is to meet the challenge of providing consulting on the risks and opportunities presented by the use of the Internet and social media as a meaningful information tool, and to develop participative communication between scientists and different audiences.

  • Opinion analysis in client communication

    Designing Customer Interaction Analytics technologies to develop a new commercial services platform.

  • i3media

    An industrial-research project dedicated to the development of technologies for the automated creation and management of intelligent audiovisual content.
    Website i3media

 

Demos & Downloads
   
Publications
   
Technical Reports