Best place to start my crawlers

gemmawright · May 25, 2012

Hello!

Following an article I read about someone creating a system that fairly accurately predicts the Dow Jones based on analysing the general 'Mood' of the twitter population by looking for key words in statements such as "I feel..." - "I am feeling..." etc I decided to try something similar with the FTSE.

Rather than analysing Twitter I've decided to write a webcrawler that reads about 10,000 pages an hour starting from 5 roots that I picked at random, e.g. World business, finance, and political news from the Financial Times - FT.com, BBC News - Business etc (I'll come back to this in a minute)

I picked the top 100 words that came up from the crawls e.g. strong,exit,credit,latest,best,buy,first being the top few. And then set up a server to collect the stats on these words every 30 minutes (it takes 20 mins to crawl) during trading times.

I've only had the system running for a few days, but already if I assign e.g. +1 to 'Good' words and -1 to 'Bad' words and have the system spit out a 'score' every 30 mins I can already see my score fluctuating along with the FTSE (although very much too early to confirm!)

My aim is to use the stats for each word to train a Neural Net against what the FTSE does in the next 1hr, 1day, 3days(where Twitter was optimum) and 1 week.

OK so my question!! At the moment I am using the following 'roots' for my crawl that I picked purely from Googling things like 'FTSE news':

"http://www.telegraph.co.uk/finance/markets",
"http://www.ft.com",
"http://www.bbc.co.uk/news/business/",
"http://www.guardian.co.uk/business/marketforceslive",
"http://www.londonstockexchange.com/home/homepage.htm"

What I want to know is what roots should I be using for optimal data? What sites do you think really affect the FTSE, ones that you think really reflect the mood or more importantly influence the mood of investors and traders?

If anyone is working on similar systems or wants updates please let me know.

Thanks

TheBramble · May 26, 2012

Forget those sites. Trawl the financial bulletin board sites like this one; I think there are one or two others too…

Identify ‘trends’ (no really, no pun, OK, well, maybe just a little one) and then FADE THAT MUTHA’…

Ross Spur · May 26, 2012

Very interesting. Could you incorporate data from share tipsters and financial channel gurus (high inverse correlation, I should imagine)?

Lightning McQueen · May 26, 2012

i'd maybe suggest at least looking at the following to add to your news crawl sources.

marketwatch.com

moneyam.com

iii.co.uk

advfn.com

stollybollybull · May 31, 2012

Surely the challenge here is trying to get the information ahead of the impact on the FTSE - the media is reporting what just happened, as opposed to what is about to happen. As such the correlation isn't surprising but I can imagine it's happening after the event.

Derwent Capital ran a hedge fund based on Twitter sentiment - they quietly shut that after a month after it blew itself up....

NVP · May 31, 2012

stollybollybull said:
Surely the challenge here is trying to get the information ahead of the impact on the FTSE - the media is reporting what just happened, as opposed to what is about to happen. As such the correlation isn't surprising but I can imagine it's happening after the event.

Derwent Capital ran a hedge fund based on Twitter sentiment - they quietly shut that after a month after it blew itself up....

or wait until you have so much of the same message out there in overload mode............you go contrarian to the message

N

stollybollybull · May 31, 2012

NVP said:
or wait until you have so much of the same message out there in overload mode............you go contrarian to the message

N

I think that's pretty much what Ross Spur was pointing at with his talking heads line!

the hare · May 31, 2012

I dont really understand the motivation for people to post trade calls on social media (other than pump and dump schemes, or vendors running the usual con tricks that these platforms fascillatate)

However, if there is evidence to suggest that there are people who are posting legitimate calls then why not create your own platform something along the lines of stocktweets, or one of those trade audit type sites. The basic idea would be to identify the best and worst performing subscribers in some category, follow the best performers and fade the worst. The big problem with this approach however is differentiating those with a genuine edge, from those who are currently on a streak of good luck, or those with systems in phase with current market conditions. This particular problem is hard enough to do as a system developer with all of the relevant information available. I honestly cant see how you'd handle this problem as someone who only see's the calls, and the final result of the call without understanding the mechanics of those trades.

Best place to start my crawlers

gemmawright

Newbie

TheBramble

Legendary member

Ross Spur

Senior member

Lightning McQueen

stollybollybull

Junior member

NVP

Guest Author

stollybollybull

Junior member

the hare

Senior member

Similar threads