|
Project summary
SentiCorr is an interactive, extensible system for automated sentiment analysis on multilingual user generated content from various social media (currently supporting Twitter, Facebook and Hyves) and e-mails (MS Outlook plug-in). One of the main goals of the system is to make people aware how much positive and negative content they read and write. The output is summarized into a database allowing for basic OLAP style exploration of the data across basic dimensions including time, source, correspondents, read/write and alike and zooming in to the level of individual messages or e-mails with positive and negative sentences being highlighted. Although the developed system can be used as a stand-alone application or an online service, we also consider different integration possibilities, e.g. stress analytics system developed within Stress-at-work project.
The sentiment analysis is based on a four-step approach including language identification, part-of-speech tagging, subjectivity detection and polarity detection. Details of the approach and system implementation can be found in the listed publications.
Publications, Talks, Poster and Demo Presentations
- Erik Tromp and Mykola Pechenizkiy (2011). SentiCorr: Multilingual Sentiment Analysis of Personal Correspondence (Demo paper) In: Proc. of ICDM 2011 Workshops, IEEE Press, preprint.
Posters presenting the SentiCorr system and the sentiment classification approach behind it.
Recorded demo: - Erik Tromp and Mykola Pechenizkiy. Graph-Based N-gram Language Identification on Short Texts. In: Proc. of the Twentieth Belgian Dutch Conference on Machine Learning (Benelearn 2011), pp. 27-34.
- Erik Tromp (2011). Sentiment Analysis on Social Media for Online Market Research MSc Thesis. Erik's thesis won two awards: the Best IT-thesis of the Netherlands 2011 granted by De Koninklijke Hollandsche Maatschappij der Wetenschappen and Berenschot thesis award
Code & Datasets
We will keep trying to make the software, source code and datasets created and used within this project available for the research community (as long as there are no NDA, IP, ethical or proprietary concerns). Please check this section later.