StockPulse is a pioneer in sentiment analysis. Our specialty is data mining in niche online communities concerned with discussions about financial markets. We also tap well-known sources like Twitter, StockTwits and classic news sources like financial news pages. Our crawlers monitor communication in these sources and produce quantitative data which in turn can be used to build trading models and feed other financial applications. Here is how it works in detail:
1. Message Collection
Web crawlers are collecting messages which are related to financial instruments and symbols, or other finance relevant topics from social media. Twitter can be accessed via its publicly available Streaming-API. Other sources like specific trading message boards need to be accessed by individually adapted scraping programs that extract the relevant information, e.g. user, timestamp, or text content.
2. Message Filtering
A spam filter scans all messages for insulting words or phrases (“rants” or “flames”). Most of these posts can be identified by their scurrile or nasty language. There is also a more sophisticated approach which analyzes relations between twitter users (retrieved by their friendship and follower network) and calculates a reputation rank or “author score” for every user. The impact of every message is based on this internal author score. Besides normal social media users there are manually selected and verified users/financial experts who carry a higher author score (five times as much) by default.
3. Natural Language Processing
The language of incoming messages is detected automatically. Based on the result appropriate Natural Language Processing is applied to determine the tonality (sentiment) of each message. Besides other methods we apply Naïve Bayes classification and word vectors. The step of automatic text analysis is particularly adapted and optimized for the domain of financial markets. For example, the word “long” has a very unique (positive) meaning in the context of financial markets whereas it might mean something negative in a different context, e.g. if a user review in Amazon writes about a “long” focusing time of a digital camera.
4. Entity Recognition
The NLP process identifies stock symbols, company names, abbreviations of names, and relevant financial events in every message. By this method each message can be assigned to one or more financial titles which can be securities, indices, currency pairs or commodities like gold or oil. The noise level is drastically reduced by using curated “title-handle maps” which provide information about source-title relationships. About 70 percent of the messages are filtered which improves data quality by reducing noise.
5. Aggregating Reports
After a message has survived filtering and was assigned to one or more titles it receives a numerical value which represents its semantic orientation and thereby classifies the message as positive, negative or neutral. This value depends on the frequency of negative and positive words and phrases which were identified in the message. The numerical raw values can theoretically range between minus infinity and plus infinity. Practically, they will range above and below zero. Especially, over time they will be smoothed since they are relative measures. For output reasons we also provide the option to normalize values on a scale of –100 to +100, where -100 means maximum negative and +100 maximum positive sentiment. Values close to zero can be considered as neutral sentiment.
Reports are available through our Services, e. g. Data Feeds, and Software Tools. You find more information on how to apply our data in your financial application under Research. If you would like to get in touch please click the button below.