Title Comet: Batched stream processing for data intensive distributed computing
Authors He, Bingsheng
Yang, Mao
Guo, Zhenyu
Chen, Rishan
Su, Bing
Lin, Wei
Zhou, Lidong
Affiliation Microsoft Research Asia, China
Peking University, China
Microsoft, China
Issue Date 2010
Citation 1st ACM Symposium on Cloud Computing, SoCC '10.Indianapolis, IN, United states.
Abstract Batched stream processing is a new distributed data processing paradigm that models recurring batch computations on incrementally bulk-appended data streams. The model is inspired by our empirical study on a trace from a large-scale production data-processing cluster; it allows a set of effective query optimizations that are not possible in a traditional batch processing model. We have developed a query processing system called Comet that embraces batched stream processing and integrates with DryadLINQ. We used two complementary methods to evaluate the effectiveness of optimizations that Comet enables. First, a prototype system deployed on a 40-node cluster shows an I/O reduction of over 40% using our benchmark. Second, when applied to a real production trace covering over 19 million machine-hours, our simulator shows an estimated I/O saving of over 50%. Copyright 2010 ACM.
URI http://hdl.handle.net/20.500.11897/330159
DOI 10.1145/1807128.1807139
Indexed EI
Appears in Collections: 待认领

Files in This Work
There are no files associated with this item.

Web of Science®


0

Checked on Last Week

Scopus®



Checked on Current Time

百度学术™


0

Checked on Current Time

Google Scholar™





License: See PKU IR operational policies.