
Cool! Anything I want! Lets make a search engine for programmers! We'll only index useful sites about programming. Here is the algorithm:
- Create a set of keywords that encompass all possible useful programming topics.
- Use the keywords on sites like delicious to get a set of useful urls.
- Create a GCS with all the urls
That is foolproof. NOT! Google limits you to searching only 5000 sites with GCS. Poop! But wait you say, 5000 seems like alot. Let's see: a quick first attempt at the above algorithm gave me 65 keywords. 10 pages of delicious search results for each keyword gave me 16,000-ish urls. Yipe! I tried running some searches with just 2000 urls and it was kind of sucky.
GCS Fail.
What's the workaround? The only thing I can think of is that maybe 5000 urls is enough for a decent search on one specific topic. Instead of creating one general programming search I could just create a million specific ones on demand. That makes a plain old Google searches sound easy by comparison. GCS is a nice idea, but they've crippled it so there is no possible way it can compete with the standard Google search unless you are working against an extremely focused collection of websites.
No comments:
Post a Comment