The use of computers, like any technological activity, is not content-neutral. Users of computers constantly interact with assumptions regarding worthwhile activity which are embedded in any computing system. Directly questioning these assumptions in the context of computing allows us to develop an understanding of responsible computing.Full Paper: http://doi.acm.org/10.1145/199544.199585 Full Proceedings: Proceedings of the Conference on Ethics in the Computer Age Effective Page Refresh Policies for Web Crawlers If you need to maintain local copies of remote data sources for better performance or availability, then you possibly have problems with the "freshness" of data. This paper particularly discusses the problem of Web crawlers that maintain local copies of remote Web pages for Web search engines. Although it's a very technical paper with lots of formulas and abstractions, I quite like their formalisation of "freshness" and "age" of data.
In this article, we study how we can maintain local copies of remote data sources "fresh," when the source data is updated autonomously and independently. In particular, we study the problem of Web crawlers that maintain local copies of remote Web pages for Web search engines. In this context, remote data sources (Websites) do not notify the copies (Web crawlers) of new changes, so we need to periodically poll the sources to maintain the copies up-to-date. Since polling the sources takes significant time and resources, it is very difficult to keep the copies completely up-to-date.This article proposes various refresh policies and studies their effectiveness. We first formalize the notion of "freshness" of copied data by defining two freshness metrics, and we propose a Poisson process as the change model of data sources. Based on this framework, we examine the effectiveness of the proposed refresh policies analytically and experimentally. We show that a Poisson process is a good model to describe the changes of Web pages and we also show that our proposed refresh policies improve the "freshness" of data very significantly. In certain cases, we got orders of magnitude improvement from existing policies.Full Paper: http://doi.acm.org/10.1145/958942.958945 Full Proceedings: ACM Transactions on Database Systems (TODS)
Labels: dbms, ethics, technicist