TitleComet: Batched stream processing for data intensive distributed computing
AuthorsHe, Bingsheng
Yang, Mao
Guo, Zhenyu
Chen, Rishan
Su, Bing
Lin, Wei
Zhou, Lidong
AffiliationMicrosoft Research Asia, China
Peking University, China
Microsoft, China
Issue Date2010
Citation1st ACM Symposium on Cloud Computing, SoCC '10.Indianapolis, IN, United states.
AbstractBatched stream processing is a new distributed data processing paradigm that models recurring batch computations on incrementally bulk-appended data streams. The model is inspired by our empirical study on a trace from a large-scale production data-processing cluster; it allows a set of effective query optimizations that are not possible in a traditional batch processing model. We have developed a query processing system called Comet that embraces batched stream processing and integrates with DryadLINQ. We used two complementary methods to evaluate the effectiveness of optimizations that Comet enables. First, a prototype system deployed on a 40-node cluster shows an I/O reduction of over 40% using our benchmark. Second, when applied to a real production trace covering over 19 million machine-hours, our simulator shows an estimated I/O saving of over 50%. Copyright 2010 ACM.
URIhttp://hdl.handle.net/20.500.11897/330159
DOI10.1145/1807128.1807139
IndexedEI
Appears in Collections:待认领

Files in This Work
There are no files associated with this item.

Web of Science®



Checked on Last Week

Scopus®



Checked on Current Time

百度学术™



Checked on Current Time

Google Scholar™





License: See PKU IR operational policies.