News sites with same story in many news categories


I read http://www.cbc.ca/news

They show categories such as Headlines, Most Viewed, World, Canada, Analysis, Politics, Health, Arts and Entertainement, Sports, etc. like most news Web sites do.

I can't stand seeing the same link under multiple categories. It takes me longer to read their main page and seeing the same link again and again.

Could anyone direct me to either a solution, or a forum or chat room, where I could perhaps get a solution from someone who has already solved this problem?

Should I scrape the news page and remove duplicates? I'm a programmer so I could do some actual coding but I have no idea where to start and whether software already exists to do part of the job.

Also, I've tried Googling for an answer but I totally failed at filtering out 50 million irrelevant results on every query I tried.

Thank you.

0 Replies

Recent Posts

in YQL