Machine sentiment analysis is cheaper than machine+human sentiment analysis, but you get what you pay for. I have been using the mantra that 70% of sentiment analysis is fairly straight forward, the other 30% is the hard bit.
No problem when things are 'great' or 'awful', but when people start being sarcastic machines struggle to cope. When Michael Jackson decided 'Bad' was 'Good' then machine analysis would (in all likelihood) have ruled his lyrics to be expressing negative sentiment, whereas he was in fact portraying himself positively.
For a human brain the comprehension of sarcasm is an interesting process. The right parahippocampal gyrus is the area of the brain that deals with sarcasm interpretation. The left hand side of the brain deals with the understanding of words and sentences, but the right hand side is needed to understand humour and language that is not literal, for example puns, jokes and sarcasm. The mental process basically involves an element of social cognition, the ability to put yourself in someone else's shoes and experience the context as well as the words (also explaining why those suffering from dementias struggle with this too.)
Machine analysis of social data alone struggles to deal with things like sarcasm for these very reasons. Computer programs can follow logical rules and interpret the words, but without human help they struggle to understand context.
If we now add youth speak and slang to the mix it gets even more complicated. The Daily Telegraph this morning has reported the work of Lisa Whittaker from the University of Stirling which studied the language used by teenagers aged 16-18 on Bebo and Facebook in Scotland. She found that:
- "Young people often distort the languages they use by making the pages difficult for those unfamiliar with the distortions and colloquialisms.," she said.
- "The language used on Bebo seems to go beyond abbreviations that are commonly used in text messaging, such as removing all the vowels.
- "This is not just bad spelling, which would suggest literacy issues, but a deliberate attempt to creatively misspell words.
- "The creation and use of their own social language may be a deliberate attempt to keep adults from understanding what is written on the page.
- "By doing this they are able to communicate with their in-group and conceal the content from the out-group. This further adds to their online identity."
There's only one thing worse than having no information, and that's having the wrong information. As the plethora of online social listening tools continue to increase, it's worth looking at more than just price - approach to data and the quality of the data provided are the key elements. Buying in a cheap system and then acting on bad data will be more expensive then paying for a machine+human solution now. With human involvement machines can learn over time, with no human sense checking / help then machine only sentiment analysis may not ever get to the point where it can be truly reliable and useful.
No point 'getting MWI' if your solution can't do the job properly.