MongoDB权威指南(5)- 聚合
jopen
13年前
<div id="article_content" class="article_content"> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">除了基本的查询功能外,mongoDB还提供了聚合工具,从简单的计数到使用MapReduce进行复杂数据的分析等。<br /> </p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;"><strong>1.count</strong></p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">最简单的聚合工具就是count了,它返回document的数量</p> <div style="border-bottom:#cccccc 1px solid;border-left:#cccccc 1px solid;padding-bottom:5px;line-height:20px;overflow-x:auto;overflow-y:auto;background-color:#f5f5f5;padding-left:5px;padding-right:5px;font-family:'Courier New';word-break:break-all;border-top:#cccccc 1px solid;border-right:#cccccc 1px solid;padding-top:5px;" class="cnblogs_code"> <div> <span style="line-height:1.5;">></span> <span style="line-height:1.5;"> db.foo.count()<br /> </span> <span style="line-height:1.5;">0</span> <span style="line-height:1.5;"><br /> </span> <span style="line-height:1.5;">></span> <span style="line-height:1.5;"> db.foo.insert({</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">x</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">1</span> <span style="line-height:1.5;">})<br /> </span> <span style="line-height:1.5;">></span> <span style="line-height:1.5;"> db.foo.count()<br /> </span> <span style="line-height:1.5;">1</span> </div> </div> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">也可以传递一个查询条件,计算符合条件的结果个数</p> <div style="border-bottom:#cccccc 1px solid;border-left:#cccccc 1px solid;padding-bottom:5px;line-height:20px;overflow-x:auto;overflow-y:auto;background-color:#f5f5f5;padding-left:5px;padding-right:5px;font-family:'Courier New';word-break:break-all;border-top:#cccccc 1px solid;border-right:#cccccc 1px solid;padding-top:5px;" class="cnblogs_code"> <div> <span style="line-height:1.5;">></span> <span style="line-height:1.5;"> db.foo.insert({</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">x</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">2</span> <span style="line-height:1.5;">})<br /> </span> <span style="line-height:1.5;">></span> <span style="line-height:1.5;"> db.foo.count()<br /> </span> <span style="line-height:1.5;">2</span> <span style="line-height:1.5;"><br /> </span> <span style="line-height:1.5;">></span> <span style="line-height:1.5;"> db.foo.count({</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">x</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">1</span> <span style="line-height:1.5;">})<br /> </span> <span style="line-height:1.5;">1</span> </div> </div> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;"><strong><br /> 2.distinct</strong></p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">distinct命令返回指定的key的所有不同的值。你必须指定一个collection和一个key。</p> <div style="border-bottom:#cccccc 1px solid;border-left:#cccccc 1px solid;padding-bottom:5px;line-height:20px;overflow-x:auto;overflow-y:auto;background-color:#f5f5f5;padding-left:5px;padding-right:5px;font-family:'Courier New';word-break:break-all;border-top:#cccccc 1px solid;border-right:#cccccc 1px solid;padding-top:5px;" class="cnblogs_code"> <div> <span style="line-height:1.5;">></span> <span style="line-height:1.5;"> db.runCommand({</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">distinct</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">people</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">key</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">age</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">})</span> </div> </div> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">假设我们的collection里的document是这样子的:</p> <div style="border-bottom:#cccccc 1px solid;border-left:#cccccc 1px solid;padding-bottom:5px;line-height:20px;overflow-x:auto;overflow-y:auto;background-color:#f5f5f5;padding-left:5px;padding-right:5px;font-family:'Courier New';word-break:break-all;border-top:#cccccc 1px solid;border-right:#cccccc 1px solid;padding-top:5px;" class="cnblogs_code"> <div> <span style="line-height:1.5;">{</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">name</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">Ada</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">age</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">20</span> <span style="line-height:1.5;">}<br /> {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">name</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">Fred</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">age</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">35</span> <span style="line-height:1.5;">}<br /> {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">name</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">Susan</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">age</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">60</span> <span style="line-height:1.5;">}<br /> {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">name</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">Andy</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">age</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">35</span> <span style="line-height:1.5;">}</span> </div> </div> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">那么返回的结果就是</p> <div style="border-bottom:#cccccc 1px solid;border-left:#cccccc 1px solid;padding-bottom:5px;line-height:20px;overflow-x:auto;overflow-y:auto;background-color:#f5f5f5;padding-left:5px;padding-right:5px;font-family:'Courier New';word-break:break-all;border-top:#cccccc 1px solid;border-right:#cccccc 1px solid;padding-top:5px;" class="cnblogs_code"> <div> <span style="line-height:1.5;">></span> <span style="line-height:1.5;"> db.runCommand({</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">distinct</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">people</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">key</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">age</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">})<br /> {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">values</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : [</span> <span style="line-height:1.5;">20</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">35</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">60</span> <span style="line-height:1.5;">], </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">ok</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">1</span> <span style="line-height:1.5;">}</span> </div> </div> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;"><strong><br /> 3.group</strong></p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">group提供了更加复杂的聚合功能,它跟SQL里边的group by很类似,你需要指定一个group by的key,mongoDB按照这个key的值把collection分成不同的组,经过聚合后每个组都产生一个结果document。</p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">假设我们有一个站点用来跟踪股票价格,从上午10点到下午4点,每隔几分钟就会有最新的股票价格存储进数据库,作为报表程序的一部分,我们想找出过去30天的收盘价,使用group就可以很容易做到。</p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">股票价格的collection里有成千上万条纪录,格式如下:</p> <div style="border-bottom:#cccccc 1px solid;border-left:#cccccc 1px solid;padding-bottom:5px;line-height:20px;overflow-x:auto;overflow-y:auto;background-color:#f5f5f5;padding-left:5px;padding-right:5px;font-family:'Courier New';word-break:break-all;border-top:#cccccc 1px solid;border-right:#cccccc 1px solid;padding-top:5px;" class="cnblogs_code"> <div> <span style="line-height:1.5;">{</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">day</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">2010/10/03</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">time</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">10/3/2010 03:57:01 GMT-400</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">price</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">4.23</span> <span style="line-height:1.5;">}<br /> {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">day</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">2010/10/04</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">time</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">10/4/2010 11:28:39 GMT-400</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">price</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">4.27</span> <span style="line-height:1.5;">}<br /> {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">day</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">2010/10/03</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">time</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">10/3/2010 05:00:23 GMT-400</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">price</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">4.10</span> <span style="line-height:1.5;">}<br /> {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">day</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">2010/10/06</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">time</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">10/6/2010 05:27:58 GMT-400</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">price</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">4.30</span> <span style="line-height:1.5;">}<br /> {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">day</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">2010/10/04</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">time</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">10/4/2010 08:34:50 GMT-400</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">price</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">4.01</span> <span style="line-height:1.5;">}</span> </div> </div> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">我们想要的是每天里边最后成交的那个价钱,结果应该是像下边这样</p> <div style="border-bottom:#cccccc 1px solid;border-left:#cccccc 1px solid;padding-bottom:5px;line-height:20px;overflow-x:auto;overflow-y:auto;background-color:#f5f5f5;padding-left:5px;padding-right:5px;font-family:'Courier New';word-break:break-all;border-top:#cccccc 1px solid;border-right:#cccccc 1px solid;padding-top:5px;" class="cnblogs_code"> <div> <span style="line-height:1.5;">[<br /> {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">time</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">10/3/2010 05:00:23 GMT-400</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">price</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">4.10</span> <span style="line-height:1.5;">},<br /> {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">time</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">10/4/2010 11:28:39 GMT-400</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">price</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">4.27</span> <span style="line-height:1.5;">},<br /> {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">time</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">10/6/2010 05:27:58 GMT-400</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">price</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">4.30</span> <span style="line-height:1.5;">}<br /> ]</span> </div> </div> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">那么我们就应该按day分组,找到每组里时间戳最新的记录,把它放到结果集里</p> <div style="border-bottom:#cccccc 1px solid;border-left:#cccccc 1px solid;padding-bottom:5px;line-height:20px;overflow-x:auto;overflow-y:auto;background-color:#f5f5f5;padding-left:5px;padding-right:5px;font-family:'Courier New';word-break:break-all;border-top:#cccccc 1px solid;border-right:#cccccc 1px solid;padding-top:5px;" class="cnblogs_code"> <div> <span style="line-height:1.5;">></span> <span style="line-height:1.5;"> db.runCommand({</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">group</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : {<br /> ... </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">ns</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">stocks</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">,<br /> ... </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">key</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">day</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">,<br /> ... </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">initial</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">time</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">0</span> <span style="line-height:1.5;">},<br /> ... </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">$reduce</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;color:#0000ff;">function</span> <span style="line-height:1.5;">(doc, prev) {<br /> ... </span> <span style="line-height:1.5;color:#0000ff;">if</span> <span style="line-height:1.5;"> (doc.time </span> <span style="line-height:1.5;">></span> <span style="line-height:1.5;"> prev.time) {<br /> ... prev.price </span> <span style="line-height:1.5;">=</span> <span style="line-height:1.5;"> doc.price;<br /> ... prev.time </span> <span style="line-height:1.5;">=</span> <span style="line-height:1.5;"> doc.time;<br /> ... }<br /> ... }}})</span> </div> </div> <ul style="line-height:20px;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;margin-left:45px;font-size:13px;"> <li>"ns" : "stocks"<br /> 指定对哪个collection运行group命令</li> <li>"key" : "day"<br /> 指定按那个key进行分组</li> <li>"initial" : {"time" : 0}<br /> 累计器初始值,每个分组第一次调用reduce方法的时候传递给它的值,在一个分组里边,始终使用同一个累计器,对累计器的修改会被保持下来。</li> <li>"$reduce" : function(doc, prev) { ... }<br /> collection里的每个document,都要对之调用reduce方法,传递两个参数给它,第一个是当前的document,第二个是累计器 document,累计器document就是到目前为止分组内的计算结果。(ps:不知道它为啥起个名字叫prev,使用total啊 accumulation之类的不是更容易理解些,我一眼看上去还以为是前一个document。)我们这个例子里,使用reduce方法来比较当前的 document和累计器document的时间,如果当前document的时间更靠后些的话,就是用当前document的值替换累计器 document的值。因为每个组都有各自的累计器,勿需担心日期的不同对累计器的影响。</li> </ul> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">先前我们说的是取最近30天的价格,我们可以加一个条件,满足条件的才会处理</p> <div style="border-bottom:#cccccc 1px solid;border-left:#cccccc 1px solid;padding-bottom:5px;line-height:20px;overflow-x:auto;overflow-y:auto;background-color:#f5f5f5;padding-left:5px;padding-right:5px;font-family:'Courier New';word-break:break-all;border-top:#cccccc 1px solid;border-right:#cccccc 1px solid;padding-top:5px;" class="cnblogs_code"> <div> <span style="line-height:1.5;">></span> <span style="line-height:1.5;"> db.runCommand({</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">group</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : {<br /> ... </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">ns</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">stocks</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">,<br /> ... </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">key</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">day</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">,<br /> ... </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">initial</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">time</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">0</span> <span style="line-height:1.5;">},<br /> ... </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">$reduce</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;color:#0000ff;">function</span> <span style="line-height:1.5;">(doc, prev) {<br /> ... </span> <span style="line-height:1.5;color:#0000ff;"> if</span> <span style="line-height:1.5;"> (doc.time </span> <span style="line-height:1.5;">></span> <span style="line-height:1.5;"> prev.time) {<br /> ... prev.price </span> <span style="line-height:1.5;">=</span> <span style="line-height:1.5;"> doc.price;<br /> ... prev.time </span> <span style="line-height:1.5;">=</span> <span style="line-height:1.5;"> doc.time;<br /> ... }},<br /> ... </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">condition</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">day</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">$gt</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">2010/09/30</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">}}<br /> ... }})</span> </div> </div> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">如果某些document没有day这个键的话,它们就会被归入到day:null这个组,你可以给condition加个条件"day" : {"$exists" : true}来排除这个组。</p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;"><em>使用终结器(Finalizer)</em></p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">终结器用于最小化从数据库到用户的数据,我们看一个博客的例子,每篇博客都有几个标签,我们想找出每天最流行的标签是什么。那么我们按照日期进行分组,对每个标签计数:</p> <div style="border-bottom:#cccccc 1px solid;border-left:#cccccc 1px solid;padding-bottom:5px;line-height:20px;overflow-x:auto;overflow-y:auto;background-color:#f5f5f5;padding-left:5px;padding-right:5px;font-family:'Courier New';word-break:break-all;border-top:#cccccc 1px solid;border-right:#cccccc 1px solid;padding-top:5px;" class="cnblogs_code"> <div> <span style="line-height:1.5;">></span> <span style="line-height:1.5;"> db.posts.group({<br /> ... </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">key</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">tags</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;color:#0000ff;">true</span> <span style="line-height:1.5;">},<br /> ... </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">initial</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">tags</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : {}},<br /> ... </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">$reduce</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;color:#0000ff;">function</span> <span style="line-height:1.5;">(doc, prev) {<br /> ... </span> <span style="line-height:1.5;color:#0000ff;">for</span> <span style="line-height:1.5;"> (i </span> <span style="line-height:1.5;color:#0000ff;">in</span> <span style="line-height:1.5;"> doc.tags) {<br /> ... </span> <span style="line-height:1.5;color:#0000ff;"> if</span> <span style="line-height:1.5;"> (doc.tags[i] </span> <span style="line-height:1.5;color:#0000ff;">in</span> <span style="line-height:1.5;"> prev.tags) {<br /> ... prev.tags[doc.tags[i]]</span> <span style="line-height:1.5;">++</span> <span style="line-height:1.5;">;<br /> ... } </span> <span style="line-height:1.5;color:#0000ff;">else</span> <span style="line-height:1.5;"> {<br /> ... prev.tags[doc.tags[i]] </span> <span style="line-height:1.5;">=</span> <span style="line-height:1.5;"> </span> <span style="line-height:1.5;">1</span> <span style="line-height:1.5;">;<br /> ... }<br /> ... }<br /> ... }})</span> </div> </div> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">返回的结果是下边这个样子</p> <div style="border-bottom:#cccccc 1px solid;border-left:#cccccc 1px solid;padding-bottom:5px;line-height:20px;overflow-x:auto;overflow-y:auto;background-color:#f5f5f5;padding-left:5px;padding-right:5px;font-family:'Courier New';word-break:break-all;border-top:#cccccc 1px solid;border-right:#cccccc 1px solid;padding-top:5px;" class="cnblogs_code"> <div> <span style="line-height:1.5;">[<br /> {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">day</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">2010/01/12</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">tags</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">nosql</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">4</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">winter</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">10</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">sledding</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">2</span> <span style="line-height:1.5;">}},<br /> {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">day</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">2010/01/13</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">tags</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">soda</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">5</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">php</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">2</span> <span style="line-height:1.5;">}},<br /> {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">day</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">2010/01/14</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">tags</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">python</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">6</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">winter</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">4</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">nosql</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">: </span> <span style="line-height:1.5;">15</span> <span style="line-height:1.5;">}}<br /> ]</span> </div> </div> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">实际上我们需要的只是值最大的那个标签,并不需要将整个tags返回给客户端,这就是group命令里可选的键"finalize"存在的原因。 finalize指定一个函数,在结果返回给客户端之前,每个分组都会执行一次这个函数。我们使用finalize来去掉不需要的部分。</p> <div style="border-bottom:#cccccc 1px solid;border-left:#cccccc 1px solid;padding-bottom:5px;line-height:20px;overflow-x:auto;overflow-y:auto;background-color:#f5f5f5;padding-left:5px;padding-right:5px;font-family:'Courier New';word-break:break-all;border-top:#cccccc 1px solid;border-right:#cccccc 1px solid;padding-top:5px;" class="cnblogs_code"> <div> <span style="line-height:1.5;">></span> <span style="line-height:1.5;"> db.runCommand({</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">group</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : {<br /> ... </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">ns</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">posts</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">,<br /> ... </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">key</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">tags</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;color:#0000ff;">true</span> <span style="line-height:1.5;">},<br /> ... </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">initial</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">tags</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : {}},<br /> ... </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">$reduce</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;color:#0000ff;">function</span> <span style="line-height:1.5;">(doc, prev) {<br /> ... </span> <span style="line-height:1.5;color:#0000ff;">for</span> <span style="line-height:1.5;"> (i </span> <span style="line-height:1.5;color:#0000ff;">in</span> <span style="line-height:1.5;"> doc.tags) {<br /> ... </span> <span style="line-height:1.5;color:#0000ff;"> if</span> <span style="line-height:1.5;"> (doc.tags[i] </span> <span style="line-height:1.5;color:#0000ff;">in</span> <span style="line-height:1.5;"> prev.tags) {<br /> ... prev.tags[doc.tags[i]]</span> <span style="line-height:1.5;">++</span> <span style="line-height:1.5;">;<br /> ... } </span> <span style="line-height:1.5;color:#0000ff;">else</span> <span style="line-height:1.5;"> {<br /> ... prev.tags[doc.tags[i]] </span> <span style="line-height:1.5;">=</span> <span style="line-height:1.5;"> </span> <span style="line-height:1.5;">1</span> <span style="line-height:1.5;">;<br /> ... }<br /> ... },<br /> ... </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">finalize</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;color:#0000ff;">function</span> <span style="line-height:1.5;">(prev) {<br /> ... </span> <span style="line-height:1.5;color:#0000ff;"> var</span> <span style="line-height:1.5;"> mostPopular </span> <span style="line-height:1.5;">=</span> <span style="line-height:1.5;"> </span> <span style="line-height:1.5;">0</span> <span style="line-height:1.5;">;<br /> ... </span> <span style="line-height:1.5;color:#0000ff;">for</span> <span style="line-height:1.5;"> (i </span> <span style="line-height:1.5;color:#0000ff;">in</span> <span style="line-height:1.5;"> prev.tags) {<br /> ... </span> <span style="line-height:1.5;color:#0000ff;">if</span> <span style="line-height:1.5;"> (prev.tags[i] </span> <span style="line-height:1.5;">></span> <span style="line-height:1.5;"> mostPopular) {<br /> ... prev.tag </span> <span style="line-height:1.5;">=</span> <span style="line-height:1.5;"> i;<br /> ... mostPopular </span> <span style="line-height:1.5;">=</span> <span style="line-height:1.5;"> prev.tags[i];<br /> ... }<br /> ... }<br /> ... </span> <span style="line-height:1.5;color:#0000ff;"> delete</span> <span style="line-height:1.5;"> prev.tags<br /> ... }}})</span> </div> </div> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;"></p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;"><em>使用函数作为分组key</em></p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">有些情况下,你可能需要更复杂的分组规则,不是一个简单的key,那么你就可以用"$keyf"来定义一个分组函数。</p> <div style="border-bottom:#cccccc 1px solid;border-left:#cccccc 1px solid;padding-bottom:5px;line-height:20px;overflow-x:auto;overflow-y:auto;background-color:#f5f5f5;padding-left:5px;padding-right:5px;font-family:'Courier New';word-break:break-all;border-top:#cccccc 1px solid;border-right:#cccccc 1px solid;padding-top:5px;" class="cnblogs_code"> <div> <span style="line-height:1.5;">></span> <span style="line-height:1.5;"> db.posts.group({</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">ns</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">posts</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">,<br /> ... </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">$keyf</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;color:#0000ff;">function</span> <span style="line-height:1.5;">(x) { </span> <span style="line-height:1.5;color:#0000ff;">return</span> <span style="line-height:1.5;"> x.category.toLowerCase(); },<br /> ... </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">initializer</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : ... })</span> </div> </div> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;"><strong><br /> 4.MapReduce</strong></p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">MapReduce可是聚合工具里的高级武器,其他工具能做的它能做,其他工具做不了的它也能做。MapReduce是一个在多个服务器间可以并行执行的聚合方法,它将问题分割成多个块,发送给不同的机器,让每个机器解决自己的部分,当所有的机器都完成之后,把所有的结果都合并起来。(ps:这说的貌似 MapReduce最原初的概念,感觉跟我们下边的内容关系不大)</p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">MapReduce分两步完成,第一步是映射(Map),将document里的键值投射为一组其他的键值对,第二步是精简(Reduce),将投射出来的键值对按照键合并,每个键最后只有一个值。(ps:这是我的理解,书上写的太拗口)</p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">使用MapReduce的代价是速度,group的速度就不咋地,MapReduce更慢,所以一般都是作为后台任务执行,完成之后对其结果collection进行查询。</p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">例子1:找出collection里所有的key</p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">使用MapReduce解决这个问题确实是杀鸡用牛刀,我们主要是看看MapReduce是如何工作的。MongoDB是无结构的,所以它不会跟踪 document里都有哪些key,我们在这个示例里对collection里的每个key的使用次数进行计数,不包括嵌入的document的key。</p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">第一步,映射(Map)使用一个特殊的函数来返回值,这些值后边接下来处理,这个特殊函数就是emit。emit给MapReduce一个key和一个 value,我们这个例子里,我们将document的每个key投射为一个记录其出现次数的数量{count : 1},因为我们要分别记录每个key的出现次数,所以就需要对每个key调用emit函数。</p> <div style="border-bottom:#cccccc 1px solid;border-left:#cccccc 1px solid;padding-bottom:5px;line-height:20px;overflow-x:auto;overflow-y:auto;background-color:#f5f5f5;padding-left:5px;padding-right:5px;font-family:'Courier New';word-break:break-all;border-top:#cccccc 1px solid;border-right:#cccccc 1px solid;padding-top:5px;" class="cnblogs_code"> <div> <span style="line-height:1.5;">></span> <span style="line-height:1.5;"> map </span> <span style="line-height:1.5;">=</span> <span style="line-height:1.5;"> </span> <span style="line-height:1.5;color:#0000ff;">function</span> <span style="line-height:1.5;">() {<br /> ... </span> <span style="line-height:1.5;color:#0000ff;">for</span> <span style="line-height:1.5;"> (</span> <span style="line-height:1.5;color:#0000ff;">var</span> <span style="line-height:1.5;"> key </span> <span style="line-height:1.5;color:#0000ff;">in</span> <span style="line-height:1.5;"> </span> <span style="line-height:1.5;color:#0000ff;">this</span> <span style="line-height:1.5;">) {<br /> ... emit(key, {count : </span> <span style="line-height:1.5;">1</span> <span style="line-height:1.5;">});<br /> ... }};</span> </div> </div> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">现在我们就有了很多的{count : 1},每个都和collection里的一个key关联,相同key的这些{count : 1}构成一个数组被传递给reduce函数,reduce函数有两个参数,第一个是key,就是emit的第一个参数,第二个是数组,包含了被投射在这个 key上的所有{count : 1} 。</p> <div style="border-bottom:#cccccc 1px solid;border-left:#cccccc 1px solid;padding-bottom:5px;line-height:20px;overflow-x:auto;overflow-y:auto;background-color:#f5f5f5;padding-left:5px;padding-right:5px;font-family:'Courier New';word-break:break-all;border-top:#cccccc 1px solid;border-right:#cccccc 1px solid;padding-top:5px;" class="cnblogs_code"> <div> <span style="line-height:1.5;">></span> <span style="line-height:1.5;"> reduce </span> <span style="line-height:1.5;">=</span> <span style="line-height:1.5;"> </span> <span style="line-height:1.5;color:#0000ff;">function</span> <span style="line-height:1.5;">(key, emits) {<br /> ... total </span> <span style="line-height:1.5;">=</span> <span style="line-height:1.5;"> </span> <span style="line-height:1.5;">0</span> <span style="line-height:1.5;">;<br /> ... </span> <span style="line-height:1.5;color:#0000ff;">for</span> <span style="line-height:1.5;"> (</span> <span style="line-height:1.5;color:#0000ff;">var</span> <span style="line-height:1.5;"> i </span> <span style="line-height:1.5;color:#0000ff;">in</span> <span style="line-height:1.5;"> emits) {<br /> ... total </span> <span style="line-height:1.5;">+=</span> <span style="line-height:1.5;"> emits[i].count;<br /> ... }<br /> ... </span> <span style="line-height:1.5;color:#0000ff;">return</span> <span style="line-height:1.5;"> {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">count</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : total};<br /> ... }</span> </div> </div> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">对来自映射阶段或者前边的reduce阶段的结果,reduce函数必须能够对其重复调用,所以reduce返回的document必须能够重新传递给reduce函数(作为第二个参数)。</p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">MapReduce函数的调用结果如下:</p> <div style="border-bottom:#cccccc 1px solid;border-left:#cccccc 1px solid;padding-bottom:5px;line-height:20px;overflow-x:auto;overflow-y:auto;background-color:#f5f5f5;padding-left:5px;padding-right:5px;font-family:'Courier New';word-break:break-all;border-top:#cccccc 1px solid;border-right:#cccccc 1px solid;padding-top:5px;" class="cnblogs_code"> <div> <span style="line-height:1.5;">></span> <span style="line-height:1.5;"> mr </span> <span style="line-height:1.5;">=</span> <span style="line-height:1.5;"> db.runCommand({</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">mapreduce</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">foo</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">map</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : map, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">reduce</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : reduce})<br /> {<br /> </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">result</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">tmp.mr.mapreduce_1266787811_1</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">,<br /> </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">timeMillis</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">12</span> <span style="line-height:1.5;">,<br /> </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">counts</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : {<br /> </span> <span style="line-height:1.5;"> "</span> <span style="line-height:1.5;">input</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">6</span> <span style="line-height:1.5;"><br /> </span> <span style="line-height:1.5;"> "</span> <span style="line-height:1.5;">emit</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">14</span> <span style="line-height:1.5;"><br /> </span> <span style="line-height:1.5;"> "</span> <span style="line-height:1.5;">output</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">5</span> <span style="line-height:1.5;"><br /> },<br /> </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">ok</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;color:#0000ff;">true</span> <span style="line-height:1.5;"><br /> }</span> </div> </div> <ul style="line-height:20px;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;margin-left:45px;font-size:13px;"> <li>"result" : "tmp.mr.mapreduce_1266787811_1"<br /> 存储MapReduce结果的collection的名字,这是个临时的collection,连接关闭后即被删除。我们可以指定一个好听点的名字,并将这个collection永久保存,稍后会讲到。</li> <li>"timeMillis" : 12<br /> 操作花费的时间,单位毫秒</li> <li>"counts" : { ... }<br /> "input" : 6 传递给map函数的document数量<br /> "emit" : 14 map函数中调用emit函数的次数<br /> "output" : 5 结果集collection中document的数量</li> </ul> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">对结果集collection执行查询就可以看到所有的key和出现次数了</p> <div style="border-bottom:#cccccc 1px solid;border-left:#cccccc 1px solid;padding-bottom:5px;line-height:20px;overflow-x:auto;overflow-y:auto;background-color:#f5f5f5;padding-left:5px;padding-right:5px;font-family:'Courier New';word-break:break-all;border-top:#cccccc 1px solid;border-right:#cccccc 1px solid;padding-top:5px;" class="cnblogs_code"> <div> <span style="line-height:1.5;">></span> <span style="line-height:1.5;"> db[mr.result].find()<br /> { </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">_id</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">_id</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">value</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : { </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">count</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">6</span> <span style="line-height:1.5;"> } }<br /> { </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">_id</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">a</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">value</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : { </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">count</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">4</span> <span style="line-height:1.5;"> } }<br /> { </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">_id</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">b</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">value</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : { </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">count</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">2</span> <span style="line-height:1.5;"> } }<br /> { </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">_id</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">x</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">value</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : { </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">count</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">1</span> <span style="line-height:1.5;"> } }<br /> { </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">_id</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">y</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">value</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : { </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">count</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">1</span> <span style="line-height:1.5;"> } }</span> </div> </div> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;"><br /> 例子2:对网页分类</p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">假设我们有个网站,用户可以提交通向其他页面的链接,用户可以给链接添加一些标签标明这个链接和特定的主题关联,如"政治","极客”,"icanhascheezburger"等。(ps:icanhascheezburger是个网站,主题内容是些搞笑的猫咪图片,配些文字说明)我们可以用MapReduce找出那些主题是最近流行的。</p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">首先,我们需要一个map函数,根据流行程度和最新程度将标签投射为一个值。</p> <div style="border-bottom:#cccccc 1px solid;border-left:#cccccc 1px solid;padding-bottom:5px;line-height:20px;overflow-x:auto;overflow-y:auto;background-color:#f5f5f5;padding-left:5px;padding-right:5px;font-family:'Courier New';word-break:break-all;border-top:#cccccc 1px solid;border-right:#cccccc 1px solid;padding-top:5px;" class="cnblogs_code"> <div> <span style="line-height:1.5;">map </span> <span style="line-height:1.5;">=</span> <span style="line-height:1.5;"> </span> <span style="line-height:1.5;color:#0000ff;">function</span> <span style="line-height:1.5;">() {<br /> </span> <span style="line-height:1.5;color:#0000ff;"> for</span> <span style="line-height:1.5;"> (</span> <span style="line-height:1.5;color:#0000ff;">var</span> <span style="line-height:1.5;"> i </span> <span style="line-height:1.5;color:#0000ff;">in</span> <span style="line-height:1.5;"> </span> <span style="line-height:1.5;color:#0000ff;">this</span> <span style="line-height:1.5;">.tags) {<br /> </span> <span style="line-height:1.5;color:#0000ff;"> var</span> <span style="line-height:1.5;"> recency </span> <span style="line-height:1.5;">=</span> <span style="line-height:1.5;"> </span> <span style="line-height:1.5;">1</span> <span style="line-height:1.5;">/</span> <span style="line-height:1.5;">(new Date() - this.date);</span> <span style="line-height:1.5;"><br /> </span> <span style="line-height:1.5;color:#0000ff;"> var</span> <span style="line-height:1.5;"> score </span> <span style="line-height:1.5;">=</span> <span style="line-height:1.5;"> recency </span> <span style="line-height:1.5;">*</span> <span style="line-height:1.5;"> </span> <span style="line-height:1.5;color:#0000ff;">this</span> <span style="line-height:1.5;">.score;</span> </div> <div> <span style="line-height:1.5;"><br /> emit(</span> <span style="line-height:1.5;color:#0000ff;">this</span> <span style="line-height:1.5;">.tags[i], {</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">urls</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : [</span> <span style="line-height:1.5;color:#0000ff;">this</span> <span style="line-height:1.5;">.url], </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">score</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : score});<br /> }<br /> };</span> </div> </div> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">然后,我们将投射到每个标签的值精简为一个值</p> <div style="border-bottom:#cccccc 1px solid;border-left:#cccccc 1px solid;padding-bottom:5px;line-height:20px;overflow-x:auto;overflow-y:auto;background-color:#f5f5f5;padding-left:5px;padding-right:5px;font-family:'Courier New';word-break:break-all;border-top:#cccccc 1px solid;border-right:#cccccc 1px solid;padding-top:5px;" class="cnblogs_code"> <div> <span style="line-height:1.5;">reduce </span> <span style="line-height:1.5;">=</span> <span style="line-height:1.5;"> </span> <span style="line-height:1.5;color:#0000ff;">function</span> <span style="line-height:1.5;">(key, emits) {<br /> </span> <span style="line-height:1.5;color:#0000ff;"> var</span> <span style="line-height:1.5;"> total </span> <span style="line-height:1.5;">=</span> <span style="line-height:1.5;"> {urls : [], score : </span> <span style="line-height:1.5;">0</span> <span style="line-height:1.5;">}<br /> </span> <span style="line-height:1.5;color:#0000ff;"> for</span> <span style="line-height:1.5;"> (</span> <span style="line-height:1.5;color:#0000ff;">var</span> <span style="line-height:1.5;"> i </span> <span style="line-height:1.5;color:#0000ff;">in</span> <span style="line-height:1.5;"> emits) {<br /> emits[i].urls.forEach(</span> <span style="line-height:1.5;color:#0000ff;">function</span> <span style="line-height:1.5;">(url) {<br /> total.urls.push(url);<br /> }</span> </div> <div> <span style="line-height:1.5;"><br /> total.score </span> <span style="line-height:1.5;">+=</span> <span style="line-height:1.5;"> emits[i].score;<br /> }<br /> </span> <span style="line-height:1.5;color:#0000ff;"> return</span> <span style="line-height:1.5;"> total;<br /> };</span> </div> </div> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">这样,结果集里就包含了每个标签的一个url列表和一个标识其流行度的总得分。</p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;"><span style="line-height:18px;font-family:仿宋;font-size:12px;">ps:</span></p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;"><span style="line-height:18px;font-size:12px;">我们和关系数据库比较一下更容易看到它的关键之处,关键之处就在于emit函数的第一个参数,sql中使用"group by 字段"进行分组,字段的每个不同值就是一个组,而MapReduce中是使用emit为每个字段不同值创建一个key和一个值的数组。简单说,sql的 group by用的是字段名,emit用的是字段的值。</span><span style="line-height:18px;font-size:12px;">明显MapReduce更加灵活强大一些。</span></p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;"><em><br /> mongoDB和MapReduce</em></p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">使用MapReduce命令,除了指定mapreduce,map,reduce这三个必须的键之外,还有很多其他的可选的键。</p> <ul style="line-height:20px;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;margin-left:45px;font-size:13px;"> <li>"finalize" : function<br /> 终结器函数,接受reduce的输出</li> <li>"keeptemp" : boolean<br /> 连接关闭后是否保存临时结果集collection</li> <li>"output" : string<br /> 输出collection的名字,使用此选项意味着keeptemp为true</li> <li>"query" : document<br /> 查询条件,过滤传递给map函数的document</li> <li>"sort" : document<br /> 发送给map函数前对document进行排序,经常是和limit联用</li> <li>"limit" : integer<br /> 发送给map函数的document的最大数量</li> <li>"scope" : document<br /> 在javascript代码中可以使用的变量</li> <li>"verbose" : boolean<br /> 是否输出更详细的服务器日志</li> </ul> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;"><em>使用scope</em></p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">如果在MapReduce中使用客户端的值,那就必须使用scope选项了。你只需要传递给scope一个<strong> 变量名:值</strong> 格式的document就可以了,然后这个值在map,reduce以及finalize函数中就可以使用了。这个变量的值在各个函数中是只读的。</p> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">比如,刚才我们第二个例子中计算页面的最新性时使用的是1/(new Date() - this.date),如果我们想不使用new Date(),而是把当前日期传递进来的话,就可以定义个叫now的变量</p> <div style="border-bottom:#cccccc 1px solid;border-left:#cccccc 1px solid;padding-bottom:5px;line-height:20px;overflow-x:auto;overflow-y:auto;background-color:#f5f5f5;padding-left:5px;padding-right:5px;font-family:'Courier New';word-break:break-all;border-top:#cccccc 1px solid;border-right:#cccccc 1px solid;padding-top:5px;" class="cnblogs_code"> <div> <span style="line-height:1.5;">></span> <span style="line-height:1.5;"> db.runCommand({</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">mapreduce</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">webpages</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">map</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : map, </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">reduce</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : reduce,<br /> </span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;">scope</span> <span style="line-height:1.5;">"</span> <span style="line-height:1.5;"> : {now : </span> <span style="line-height:1.5;color:#0000ff;">new</span> <span style="line-height:1.5;"> Date()}})</span> </div> </div> <p style="line-height:20px;margin:5px auto;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:13px;">然后在map函数里就可以用1/(now - this.date)了。</p> </div>