The W3C (World Wide Web Consortium) hosts an annual conference to discuss the state of the World Wide Web industry, share the latest research findings, and plan for the future of the internet. This year the WWW2010 Conference was held in Raleigh, North Carolina, and Yahoo! was proud to be a major sponsor.
The meat and potatoes of the WWW conferences are the research-paper presentations. However, there's also a vibrant stream of conversations within the halls. It's an opportunity for engineers to discuss the state of the art with colleagues from other companies and universities.
The conversations this year often centered around the breakout topics:
Many of the presentations also dealt with real-time discussions on Twitter and how they can be used for influencing search relevancy. Yahoo! Research Labs presented Time is of the Essence: Improving Recency Ranking Using Twitter Data by Anlei Dong, Ruiqiang Zhang, Pranam Kolari, Bai Jing, Yi Chang, Fernando Diaz, Zhaohui Zheng, and Hongyuan Zha. Here's how they describe their paper:
This paper proposes a ranking system for web search which utilizes Twitter data to improve ranking results, especially to improve the freshness of ranking results. We treat the URLs that were ever referred by Twitter users (called as Twitter URLs) differently compared with regular URLs. A challenging problem for Twitter URLs is that they lack click information and anchor-text information due to their freshness, which restrict them from being promoted appropriately in ranking results.
We analyze the unique characteristics within the twitter microcosm, such as Twitter users following relationship and the texts of tweets, and we use them as new evidences for ranking Twitter URLs appropriately in web search. We then use a compositional modeling algorithm to fully use the available data and different categories of rank features. This approach solves the dilemma in recency ranking that fresh documents cannot be promoted appropriately due to the lack of favorable rank features that need to be aggregated over time.
To evaluate ranking results, we not only incorporate recency demotion into discounted cumulative grade (DCG) for stale documents, but also use discounted cumulative freshness (DCF) to evaluate the most fresh documents in ranking results. The efficacy of this approach is illustrated by the experiments on real data.
Time is of the Essence: Improving Recency Ranking Using Twitter Data - www2010 conference
While many of the topics discussed at the conference dealt with technical details appropriate for large entities — such as Yahoo!, Google, or Facebook — there were numerous discussions that would affect the professional web developer. This is a turbulent period in programming: we have distributed computing in the Cloud, a new HTML standard, rapidly changing privacy rules and expectations, and an explosion of available data and APIs for developers to build upon.
Here are some of the topic highlights that address these issues: