MultiBooked Corpus accepted in LREC 2018

MultiBooked Corpora

The main motivation and difficulty to cross-lingual sentiment analysis is the lack of annotated data in most languages. Even for languages with millions of speakers, like Catalan, there are very few resources available. So we decided to create a corpus of hotel reviews Basque and Catalan (since these are the two under-resourced languages I am most familiar with) and annotate them for aspect-level sentiment analysis.

We have published the results, as well as benchmarks, in our paper MultiBooked: A Corpus of Basque and Catalan Hotel Reviews Annotated for Aspect-level Sentiment Classification [], which has been accepted in LREC 2018.