gemmawright
Newbie
- Messages
- 4
- Likes
- 0
Hello!
Following an article I read about someone creating a system that fairly accurately predicts the Dow Jones based on analysing the general 'Mood' of the twitter population by looking for key words in statements such as "I feel..." - "I am feeling..." etc I decided to try something similar with the FTSE.
Rather than analysing Twitter I've decided to write a webcrawler that reads about 10,000 pages an hour starting from 5 roots that I picked at random, e.g. World business, finance, and political news from the Financial Times - FT.com, BBC News - Business etc (I'll come back to this in a minute)
I picked the top 100 words that came up from the crawls e.g. strong,exit,credit,latest,best,buy,first being the top few. And then set up a server to collect the stats on these words every 30 minutes (it takes 20 mins to crawl) during trading times.
I've only had the system running for a few days, but already if I assign e.g. +1 to 'Good' words and -1 to 'Bad' words and have the system spit out a 'score' every 30 mins I can already see my score fluctuating along with the FTSE (although very much too early to confirm!)
My aim is to use the stats for each word to train a Neural Net against what the FTSE does in the next 1hr, 1day, 3days(where Twitter was optimum) and 1 week.
OK so my question!! At the moment I am using the following 'roots' for my crawl that I picked purely from Googling things like 'FTSE news':
"http://www.telegraph.co.uk/finance/markets",
"http://www.ft.com",
"http://www.bbc.co.uk/news/business/",
"http://www.guardian.co.uk/business/marketforceslive",
"http://www.londonstockexchange.com/home/homepage.htm"
What I want to know is what roots should I be using for optimal data? What sites do you think really affect the FTSE, ones that you think really reflect the mood or more importantly influence the mood of investors and traders?
If anyone is working on similar systems or wants updates please let me know.
Thanks
Following an article I read about someone creating a system that fairly accurately predicts the Dow Jones based on analysing the general 'Mood' of the twitter population by looking for key words in statements such as "I feel..." - "I am feeling..." etc I decided to try something similar with the FTSE.
Rather than analysing Twitter I've decided to write a webcrawler that reads about 10,000 pages an hour starting from 5 roots that I picked at random, e.g. World business, finance, and political news from the Financial Times - FT.com, BBC News - Business etc (I'll come back to this in a minute)
I picked the top 100 words that came up from the crawls e.g. strong,exit,credit,latest,best,buy,first being the top few. And then set up a server to collect the stats on these words every 30 minutes (it takes 20 mins to crawl) during trading times.
I've only had the system running for a few days, but already if I assign e.g. +1 to 'Good' words and -1 to 'Bad' words and have the system spit out a 'score' every 30 mins I can already see my score fluctuating along with the FTSE (although very much too early to confirm!)
My aim is to use the stats for each word to train a Neural Net against what the FTSE does in the next 1hr, 1day, 3days(where Twitter was optimum) and 1 week.
OK so my question!! At the moment I am using the following 'roots' for my crawl that I picked purely from Googling things like 'FTSE news':
"http://www.telegraph.co.uk/finance/markets",
"http://www.ft.com",
"http://www.bbc.co.uk/news/business/",
"http://www.guardian.co.uk/business/marketforceslive",
"http://www.londonstockexchange.com/home/homepage.htm"
What I want to know is what roots should I be using for optimal data? What sites do you think really affect the FTSE, ones that you think really reflect the mood or more importantly influence the mood of investors and traders?
If anyone is working on similar systems or wants updates please let me know.
Thanks