Title | Comet: Batched stream processing for data intensive distributed computing |
Authors | He, Bingsheng Yang, Mao Guo, Zhenyu Chen, Rishan Su, Bing Lin, Wei Zhou, Lidong |
Affiliation | Microsoft Research Asia, China Peking University, China Microsoft, China |
Issue Date | 2010 |
Citation | 1st ACM Symposium on Cloud Computing, SoCC '10.Indianapolis, IN, United states. |
Abstract | Batched stream processing is a new distributed data processing paradigm that models recurring batch computations on incrementally bulk-appended data streams. The model is inspired by our empirical study on a trace from a large-scale production data-processing cluster; it allows a set of effective query optimizations that are not possible in a traditional batch processing model. We have developed a query processing system called Comet that embraces batched stream processing and integrates with DryadLINQ. We used two complementary methods to evaluate the effectiveness of optimizations that Comet enables. First, a prototype system deployed on a 40-node cluster shows an I/O reduction of over 40% using our benchmark. Second, when applied to a real production trace covering over 19 million machine-hours, our simulator shows an estimated I/O saving of over 50%. Copyright 2010 ACM. |
URI | http://hdl.handle.net/20.500.11897/330159 |
DOI | 10.1145/1807128.1807139 |
Indexed | EI |
Appears in Collections: | 待认领 |