周末的时间,我们在GitHub用什么语言编程?
jopen 8年前
<p style="text-align: center;"><a href="/misc/goto?guid=4958986207227852489" title="github"><img alt="周末的时间,我们在GitHub用什么语言编程?" src="https://simg.open-open.com/show/6a8dd9147ee3d646e9ccfef8f071d6d9.png" /></a></p> <p>村上春树(日本后现代主义作家)有一本书,《当我跑步时,我谈些什么》,Felipe 这篇文章的统计数据和这本书有点类似,周末的时间我们会使用什么语言进行编程,看了他的文章,我们会知道在办公室里使用的语言,不过是谋生工具而已,私底下大家并不喜欢它们。</p> <p>下面这张表格的内容是 2016 年度排名前 20 位的周末编程语言,根据 <a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwj5ofjF8fHSAhVFVWMKHSmUCMwQFggnMAA&url=https://cloud.google.com/bigquery/&usg=AFQjCNHgAoo0xZ0h_6rmYL5C4N1NKI-ahQ">GitHub</a>(全球领先的软件开发平台)、<a href="/misc/goto?guid=4959005038555701331">GHTorrent</a>(监控着 GitHub 上的每一次更改,并将每一次更改的更新内容,以 JSON 字符串形式存储在 MongoDB 数据库)、<a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwj5ofjF8fHSAhVFVWMKHSmUCMwQFggnMAA&url=https://cloud.google.com/bigquery/&usg=AFQjCNHgAoo0xZ0h_6rmYL5C4N1NKI-ahQ">BigQuery</a>(谷歌的数据仓库)等三个平台数据作为分析数据源。</p> <p style="text-align:center"><img alt="周末的时间,我们在GitHub用什么语言编程?" src="https://simg.open-open.com/show/3b08384f73d450232098ab679153646a.jpg" /></p> <p>从上面这张表格我们可以看到,排名前二十位的分别是:</p> <ol> <li><a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwi284Sa8vHSAhUW22MKHWtdDMMQFggmMAE&url=https://github.com/rust-lang/rust&usg=AFQjCNEvOLBGOlhzhk_8uthN_wRu_1foPA">Rust</a></li> <li><a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwivuoCr8vHSAhUO1WMKHRxMAOgQFggnMAE&url=https://en.wikipedia.org/wiki/OpenGL_Shading_Language&usg=AFQjCNEKO3BTMSJeMXzXDv3d6cB4IUFjMQ">Glsl</a>(以C语言为基础的高阶着色语言,它是由 OpenGL ARB 建立)</li> <li><a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwjumJzw8vHSAhVDy2MKHX61Cp0QFggdMAA&url=https://dlang.org/&usg=AFQjCNGPVNBOLhv6wkWFnv34K4oYT_FF2Q">D</a>(类似于C语法的静态类型编程语言)</li> <li><a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwjztqyJ8_HSAhXDKWMKHWzGDIgQFggaMAA&url=https://www.haskell.org/&usg=AFQjCNHxFF7x0nAVz6pJhSKEkZAp1lYS1A">Haskell</a>(函数型编程语言)</li> <li><a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwii-pe08_HSAhUOw2MKHbSrCwMQFggdMAA&url=https://common-lisp.net/&usg=AFQjCNFDIW-ykWvKar2Wa5U4ScoiQqecew">Common Lisp</a>(Lisp 语言的分支)</li> <li><a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=8&cad=rja&uact=8&ved=0ahUKEwjxlfbV8_HSAhUJzmMKHdfGAEwQFggxMAc&url=https://en.wikipedia.org/wiki/KiCad&usg=AFQjCNE2OA3f-C25AlWPOFHcHm6xIJrkmw">Kicad</a>(开源的电路自动设计语言)</li> <li><a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwiZ7bPz8_HSAhUL6mMKHbliD0YQFggdMAA&url=https://www.gnu.org/s/emacs/manual/html_node/elisp/&usg=AFQjCNHgwbYb4CTG0dzzgfMIL6dfeyswFQ">Emacs Lisp</a>(Lisp 语言的分支)</li> <li><a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwi105Sf9PHSAhUBDGMKHYNLCW4QFggaMAA&url=https://www.lua.org/&usg=AFQjCNErx6dOqg5an4-PTCIVbBw_Vow0kA">Lua</a>(轻量级的可扩展脚本语言)</li> <li><a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&cad=rja&uact=8&ved=0ahUKEwjmt8a89PHSAhUN8mMKHcdRBs8QFggiMAI&url=http://www.schemers.org/&usg=AFQjCNF-Ewv1QNzqeQpfzTEotBr2C4ui9Q">Scheme</a>(函数式编程语言,是 Lisp 的两种主要方言之一,另一种为 Common Lisp)</li> <li><a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwjXyZXP9PHSAhUS22MKHYChDTcQFggaMAA&url=http://julialang.org/&usg=AFQjCNF47iiW0ylaSeEf0dOl27c5kZ22MA">Julia</a>(针对数值计算的高性能动态编程语言)</li> <li><a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwiQ77Lu9PHSAhUM6mMKHcPZADkQFggaMAA&url=http://elm-lang.org/&usg=AFQjCNEbyTAo2XqjYDmvePaR8MdvZ-p4vg">Elm</a>(编译为 JavaScript 的函数型语言)</li> <li>Eagle</li> <li><a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&cad=rja&uact=8&ved=0ahUKEwjw-fml9fHSAhUMzmMKHVp2ABkQFggoMAM&url=https://racket-lang.org/&usg=AFQjCNEOFk-19iNJNGjZzqTWT140R5ULiQ">Racket</a>(LISP 家族中的一员)</li> <li><a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwjPnpS89fHSAhVT0WMKHcO2AGAQFggqMAE&url=https://www.dartlang.org/&usg=AFQjCNEUhMcqYeQ9t6cjfcx6KKdImFKrgw">Dart</a>(分布式编程语言,谷歌的很多应用程序都依赖于这个语言)</li> <li><a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwjwgpfY9fHSAhVP72MKHS7WBg0QFggaMAA&url=http://nsis.sourceforge.net/&usg=AFQjCNEyuekHtWw7SRndFuNlzufn-DOMEA">Nsis</a>(全称 Nullsoft Scriptable Install System)</li> <li><a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwit1PPw9fHSAhUUUGMKHW3dCRQQFggaMAA&url=https://clojure.org/&usg=AFQjCNH970FbG65WKqxMZnynYvARENGeiA">Clojure</a>(Lisp 语言的一种,可以运行在 Java 虚拟机上)</li> <li><a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwiEssmT9vHSAhUW8GMKHWIQD48QFggdMAA&url=https://kotlinlang.org/&usg=AFQjCNGXKT03tgxJWb_HYfMeXhS5u0243A">Kotlin</a>(针对 JVM、安卓和浏览器设计的一种静态类型语言)</li> <li><a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwja7pur9vHSAhVK8GMKHQkjBIoQFggdMAA&url=http://elixir-lang.org/&usg=AFQjCNFuxjfBTKvqhu_2giTJDplT6k_8Jw">Elixir</a>(运行在 Erlang VM,低延时语言)</li> <li><a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwjEtYTX9vHSAhUW1GMKHeWjCmoQFggdMAA&url=http://www.ocaml.org/&usg=AFQjCNH2SPsXFQGEw3nnvz1lhEA9Jj9KxA">F#</a>(微软发展的为 .NET 语言提供运行环境的程序设计语言,属于函数型语言)</li> <li><a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwjEtYTX9vHSAhUW1GMKHeWjCmoQFggdMAA&url=http://www.ocaml.org/&usg=AFQjCNH2SPsXFQGEw3nnvz1lhEA9Jj9KxA">Ocaml</a></li> </ol> <p>这个排名很清晰地告诉我们,2016 年属于 Functional Languages(函数型语言),以及 3D 世界的脚本语言。另外学习 Java 的朋友可以发现,Java 没有入选,但是很多函数型语言可以基于 JVM 运行,这给了我们一个提示,即未来 JVM 将会包容更多的语言,而不仅仅是 Java 语言本身。</p> <p>再来看看后 20 位语言,你会发现很多我们日常谋生的语言都在这里呢,包括:</p> <ul> <li>Nginx(高性能的 HTTP 和反向代理服务器,也是 IMAP/POP3/SMTP 服务器)</li> <li>Matlab(美国 MathWorks 公司出品的商业数学软件,用于算法开发、数据可视化、数据分析以及数值计算的高级技术计算语言和交互式环境)</li> <li>Processing(具有革命前瞻性的新兴计算机语言,是 Java 语言的延伸,并支持许多现有的 Java 语言架构,不过在语法(syntax)上简易许多,并具有许多贴心及人性化的设计)</li> <li>Vue(前端开发库,通过简洁的 API 提供高效的数据绑定和灵活的组件系统)</li> <li>Fortran(世界上最早出现的计算机高级程序设计语言,广泛应用于科学和工程计算领域)</li> <li>Visual Basic(这是我上大学时候最常用的语言,一种高级语言,具备可操作的 GUI 界面)</li> <li>Objective-C++(主要使用于 Mac OS X 和 GNUstep 这两个使用 OpenStep 标准的系统)</li> <li>Plsql(是一种程序语言,叫做过程化 SQL 语言(Procedural Language/SQL))</li> <li>Plpgsql</li> <li>Web Ontology Language(W3C 推荐的语义互联网中本体描述语言的标准)</li> <li>Smarty(使用 PHP 写出来的模板引擎,是目前业界最著名的 PHP 模板引擎之一)</li> <li>Groovy(基于 JVM(Java 虚拟机)的敏捷开发语言,它结合了 Python、Ruby 和 Smalltalk 的许多强大的特性,Groovy 代码能够与 Java 代码很好地结合,也能用于扩展现有代码)</li> <li>Batchfile</li> <li>Objective-C</li> <li>Powershell(命令行外壳程序和脚本环境,使命令行用户和脚本编写者可以利用 .Net Framework 的强大功能)</li> <li>Xslt(扩展样式表转换语言的外语缩写,这是一种对 XML(标准通用标记语言的子集)文档进行转化的语言,XSLT 中的T代表英语中的“转换”(Transformation))</li> <li>Cucumber(能够理解用普通语言描述的测试用例的支持行为驱动开发(BDD)的自动化测试工具,用 Ruby 编写,支持 Java 和·Net 等多种开发语言)</li> <li>Hcl</li> <li>Puppet(Linux、Unix、Windows 平台的集中配置管理系统,使用自有的 Puppet 描述语言,可管理配置文件、用户、cron 任务、软件包、系统服务等)</li> <li>Gcc Machine Description(GNU 编译器套件(GNU Compiler Collection)包括C、C++、Objective-C、Fortran、Java、Ada 和 Go 语言的前端,也包括了这些语言的库(如 libstdc++、libgcj 等等))。</li> </ul> <p style="text-align:center"><img alt="周末的时间,我们在GitHub用什么语言编程?" src="https://simg.open-open.com/show/8b2249b700fefe15c0d2be33117e8f34.jpg" /></p> <p>让我们看看这些 2016 年流行的周末“Happy Hour”语言的变化趋势,如图所示:</p> <p style="text-align:center"><img alt="周末的时间,我们在GitHub用什么语言编程?" src="https://simg.open-open.com/show/fb1f45a6e879071c720a748c0212a62e.jpg" /></p> <p>我们解释一下上面这张图所表达的含义:</p> <ul> <li>Rust 语言过去只在工作日使用,最近 6 年呈现爆发性增长,成为一门“周末”语言;</li> <li>越是流行,越是会成为一门“周末”语言;</li> <li>Puppet 语言周末大家用得很少,一直都很少;</li> <li>Ruby 正在逐渐和日常工作脱离,转而进入编程爱好者的周末乐园;</li> <li>R语言看得出来前两年(2014、2015)很火,但是 2016 年又回归到了中等热度;</li> <li>Haskell、Clojure,谁让人工智能这么火呢,这两门语言自然在周末也得多练习;</li> <li>Arduino 语言正在逐渐火起来;</li> <li>Python 和C,这几年还是有很多开发人员在使用,保持着持续流行状态。</li> </ul> <p>那么其他语言是如何按照流行程度排序的?请看下面这张图:</p> <p style="text-align:center"><img alt="周末的时间,我们在GitHub用什么语言编程?" src="https://simg.open-open.com/show/03b1ca6608f729f989e08f026828e927.jpg" /></p> <p>作者列举了一些来自 reddit 和 HN 的问题,并作出回复。</p> <p><a href="/misc/goto?guid=4959005040358688924">/u/techmidrop</a> 说:“我妈妈说了,我只能在星期天的下午 4 点到 6 点使用 Rust”。</p> <blockquote> <p>很多人和你一样!Rust 这几年来都位居周末语言榜首。但是回到 2010 年,它也是工作日语言之一。</p> </blockquote> <p><a href="/misc/goto?guid=4959005040446505718">/u/mooglinux 说</a>:“我现在需要 GitHub 出一份报告,报告中需要说明哪些语言分别在周末、工作日被提交的次数最多”。</p> <blockquote> <p>放心,会有的。</p> </blockquote> <p><a href="/misc/goto?guid=4959005040535926014">/u/MasterRaceLordGaben 说</a>:“谁在周末玩 Assembly?”</p> <blockquote> <p>也许是学生做家庭作业?Assembly 这门语言在这里其实不是很知名,也许是有人在 Stack<br /> Overflow 里搜索问题答案,其实并不是真的想写代码。</p> </blockquote> <p>作者使用 BigQuery 快速地分析了这些数据集,只用了 5 分钟时间。</p> <p>对 2016 年流行语言进行排名,代码如下:</p> <pre> <span style="color:#000000">#standardSQL </span><span style="color:#0000ff">SELECT</span><span style="color:#000000"> lang , </span><span style="color:#ff00ff">ROUND</span>(weekend<span style="color:#808080">/</span>weekday,<strong>2</strong><span style="color:#000000">) ratio , weekday, weekend , repos</span><span style="color:#ff0000">[</span><span style="color:#ff0000">OFFSET (0)</span><span style="color:#ff0000">]</span>.value sample_repo, repos<span style="color:#ff0000">[</span><span style="color:#ff0000">OFFSET (1)</span><span style="color:#ff0000">]</span><span style="color:#000000">.value sample_repo_2 </span><span style="color:#0000ff">FROM</span><span style="color:#000000"> ( </span><span style="color:#0000ff">SELECT</span> lang, <span style="color:#ff00ff">month</span><span style="color:#000000"> , </span><span style="color:#ff00ff">MAX</span>(<span style="color:#0000ff">IF</span>(weekday,c,<span style="color:#0000ff">null</span>)) weekday, <span style="color:#ff00ff">MAX</span>(<span style="color:#0000ff">IF</span>(<span style="color:#808080">NOT</span> weekday,c,<span style="color:#0000ff">null</span><span style="color:#000000">)) weekend , ANY_VALUE (repos) repos </span><span style="color:#0000ff">FROM</span><span style="color:#000000"> ( </span><span style="color:#0000ff">SELECT</span> language lang, TIMESTAMP_TRUNC (created_at, <span style="color:#ff00ff">YEAR</span>) <span style="color:#ff00ff">month</span><span style="color:#000000"> , EXTRACT (DAYOFWEEK </span><span style="color:#0000ff">FROM</span> a.created_at) <span style="color:#808080">BETWEEN</span> <strong>2</strong> <span style="color:#808080">AND</span> <strong>6</strong><span style="color:#000000"> weekday , </span><span style="color:#ff00ff">COUNT</span>(<span style="color:#0000ff">DISTINCT</span><span style="color:#000000"> committer_id) c , APPROX_TOP_COUNT (repo, </span><strong>3</strong><span style="color:#000000">) repos </span><span style="color:#0000ff">FROM</span> `ghtorrent<span style="color:#808080">-</span><span style="color:#000000">bq.ght_2017_01_19.commits` a </span><span style="color:#808080">JOIN</span> `fh<span style="color:#808080">-</span><span style="color:#000000">bigquery.github_extracts.ght_project_languages` b </span><span style="color:#0000ff">ON</span> a.project_id<span style="color:#808080">=</span><span style="color:#000000">b.project_id </span><span style="color:#0000ff">WHERE</span> b.<span style="color:#0000ff">percent</span><span style="color:#808080">></span><strong>0.25</strong> <span style="color:#808080">AND</span> EXTRACT (<span style="color:#ff00ff">YEAR</span> <span style="color:#0000ff">FROM</span> a.created_at) <span style="color:#808080">BETWEEN</span> <strong>2016</strong> <span style="color:#808080">AND</span> <strong>2016</strong> <span style="color:#0000ff">GROUP</span> <span style="color:#0000ff">BY</span> <strong>1</strong>,<strong>2</strong>,<strong>3</strong> <span style="color:#0000ff">HAVING</span> c<span style="color:#808080">></span><strong>100</strong><span style="color:#000000"> ) </span><span style="color:#0000ff">GROUP</span> <span style="color:#0000ff">BY</span> <strong>1</strong>,<strong>2</strong><span style="color:#000000"> ) </span><span style="color:#0000ff">WHERE</span> (weekend<span style="color:#808080">+</span>weekday)<span style="color:#808080">></span><strong>1450</strong> <span style="color:#0000ff">ORDER</span> <span style="color:#0000ff">BY</span> ratio <span style="color:#0000ff">DESC</span></pre> <p>对 2010-2016 年的数据画图,代码如下:</p> <pre> <span style="color:#000000">#standardSQL </span><span style="color:#0000ff">SELECT</span> <span style="color:#808080">*</span> <span style="color:#0000ff">FROM</span><span style="color:#000000"> ( </span><span style="color:#0000ff">SELECT</span> <span style="color:#808080">*</span>, <strong>40</strong><span style="color:#808080">-</span>rn inv_rank, ROW_NUMBER () <span style="color:#0000ff">OVER</span>(PARTITION <span style="color:#0000ff">BY</span> <span style="color:#ff00ff">month</span> <span style="color:#0000ff">ORDER</span> <span style="color:#0000ff">BY</span><span style="color:#000000"> ratio) weekend_rank, </span><span style="color:#ff00ff">MAX</span>(<span style="color:#ff00ff">month</span>) <span style="color:#0000ff">OVER</span>(PARTITION <span style="color:#0000ff">BY</span><span style="color:#000000"> lang) max_month </span><span style="color:#0000ff">FROM</span><span style="color:#000000"> ( </span><span style="color:#0000ff">SELECT</span> lang, <span style="color:#ff00ff">month</span><span style="color:#000000"> , </span><span style="color:#ff00ff">ROUND</span>(weekend<span style="color:#808080">/</span>weekday,<strong>2</strong><span style="color:#000000">) ratio , weekday, weekend, weekday</span><span style="color:#808080">+</span><span style="color:#000000">weekend total , repos</span><span style="color:#ff0000">[</span><span style="color:#ff0000">OFFSET (0)</span><span style="color:#ff0000">]</span><span style="color:#000000">.value sample_repo , ROW_NUMBER () </span><span style="color:#0000ff">OVER</span>(PARTITION <span style="color:#0000ff">BY</span> <span style="color:#ff00ff">month</span> <span style="color:#0000ff">ORDER</span> <span style="color:#0000ff">BY</span> weekday<span style="color:#808080">+</span>weekend <span style="color:#0000ff">DESC</span><span style="color:#000000">) rn </span><span style="color:#0000ff">FROM</span><span style="color:#000000"> ( </span><span style="color:#0000ff">SELECT</span> lang, <span style="color:#ff00ff">month</span><span style="color:#000000"> , </span><span style="color:#ff00ff">MAX</span>(<span style="color:#0000ff">IF</span>(weekday,c,<span style="color:#0000ff">null</span>)) weekday, <span style="color:#ff00ff">MAX</span>(<span style="color:#0000ff">IF</span>(<span style="color:#808080">NOT</span> weekday,c,<span style="color:#0000ff">null</span><span style="color:#000000">)) weekend , ANY_VALUE (repos) repos </span><span style="color:#0000ff">FROM</span><span style="color:#000000"> ( </span><span style="color:#0000ff">SELECT</span> language lang, TIMESTAMP_TRUNC (created_at, <span style="color:#ff00ff">YEAR</span>) <span style="color:#ff00ff">month</span><span style="color:#000000"> , EXTRACT (DAYOFWEEK </span><span style="color:#0000ff">FROM</span> a.created_at) <span style="color:#808080">BETWEEN</span> <strong>2</strong> <span style="color:#808080">AND</span> <strong>6</strong><span style="color:#000000"> weekday , </span><span style="color:#ff00ff">COUNT</span>(<span style="color:#0000ff">DISTINCT</span><span style="color:#000000"> committer_id) c , APPROX_TOP_COUNT (repo, </span><strong>3</strong><span style="color:#000000">) repos </span><span style="color:#0000ff">FROM</span> `ghtorrent<span style="color:#808080">-</span><span style="color:#000000">bq.ght_2017_01_19.commits` a </span><span style="color:#808080">JOIN</span> `fh<span style="color:#808080">-</span><span style="color:#000000">bigquery.github_extracts.ght_project_languages` b </span><span style="color:#0000ff">ON</span> a.project_id<span style="color:#808080">=</span><span style="color:#000000">b.project_id </span><span style="color:#0000ff">WHERE</span> b.<span style="color:#0000ff">percent</span><span style="color:#808080">></span><strong>0.25</strong> <span style="color:#808080">AND</span> language <span style="color:#808080">IN</span> UNNEST (SPLIT (<span style="color:#ff0000">'</span><span style="color:#ff0000">rust,haskell,c,clojure,arduino,ruby,python,go,r,puppet,xml</span><span style="color:#ff0000">'</span><span style="color:#000000">)) </span><span style="color:#808080">AND</span> EXTRACT (<span style="color:#ff00ff">YEAR</span> <span style="color:#0000ff">FROM</span> a.created_at) <span style="color:#808080">BETWEEN</span> <strong>2010</strong> <span style="color:#808080">AND</span> <strong>2016</strong> <span style="color:#0000ff">GROUP</span> <span style="color:#0000ff">BY</span> <strong>1</strong>,<strong>2</strong>,<strong>3</strong><span style="color:#000000"> ) </span><span style="color:#0000ff">GROUP</span> <span style="color:#0000ff">BY</span> <strong>1</strong>,<strong>2</strong><span style="color:#000000"> ) ) </span><span style="color:#0000ff">WHERE</span> rn<span style="color:#808080"><=</span><strong>40</strong><span style="color:#000000"> ) </span><span style="color:#0000ff">ORDER</span> <span style="color:#0000ff">BY</span> <strong>2</strong>,<strong>3</strong> <span style="color:#0000ff">DESC</span></pre> <p>外部查询语句:</p> <pre> <span style="color:#0000ff">SELECT</span><span style="color:#000000"> lang , EXTRACT (DAYOFWEEK </span><span style="color:#0000ff">FROM</span> date) <span style="color:#808080">BETWEEN</span> <strong>2</strong> <span style="color:#808080">AND</span> <strong>6</strong><span style="color:#000000"> weekday , </span><span style="color:#ff00ff">COUNT</span>(<span style="color:#0000ff">DISTINCT</span><span style="color:#000000"> email) c , APPROX_TOP_COUNT (repo, </span><strong>3</strong><span style="color:#000000">) repos </span><span style="color:#0000ff">FROM</span><span style="color:#000000"> ( </span><span style="color:#0000ff">SELECT</span> author.email, <span style="color:#ff00ff">LOWER</span>(REGEXP_EXTRACT (diff.new_path, r<span style="color:#ff0000">'</span><span style="color:#ff0000">\.([^\./\(~_ \- #]*)$</span><span style="color:#ff0000">'</span>)) lang, author.date, repo_name<span style="color:#ff0000">[</span><span style="color:#ff0000">OFFSET (0)</span><span style="color:#ff0000">]</span><span style="color:#000000"> repo </span><span style="color:#0000ff">FROM</span> `bigquery<span style="color:#808080">-</span><span style="color:#0000ff">public</span><span style="color:#808080">-</span>data.github_repos.commits`, UNNEST (<span style="color:#ff00ff">difference</span><span style="color:#000000">) diff </span><span style="color:#0000ff">WHERE</span> EXTRACT (<span style="color:#ff00ff">YEAR</span> <span style="color:#0000ff">FROM</span> author.date)<span style="color:#808080">=</span><strong>2016</strong><span style="color:#000000"> ) </span><span style="color:#0000ff">WHERE</span> lang <span style="color:#0000ff">IS</span> <span style="color:#808080">NOT</span> <span style="color:#0000ff">null</span> <span style="color:#808080">AND</span> LENGTH (lang)<span style="color:#808080"><</span><strong>8</strong> <span style="color:#808080">AND</span> REGEXP_CONTAINS (lang, <span style="color:#ff0000">'</span><span style="color:#ff0000">[a-zA-Z]</span><span style="color:#ff0000">'</span><span style="color:#000000">) </span><span style="color:#0000ff">GROUP</span> <span style="color:#0000ff">BY</span> <strong>1</strong>,<strong>2</strong> <span style="color:#0000ff">HAVING</span> c<span style="color:#808080">></span><strong>100</strong> <span style="color:#0000ff">ORDER</span> <span style="color:#0000ff">BY</span> c <span style="color:#0000ff">DESC</span></pre> <p>如何衡量数据:</p> <pre> <span style="color:#0000ff">SELECT</span><span style="color:#000000"> lang , </span><span style="color:#ff00ff">MAX</span>(<span style="color:#0000ff">IF</span>(weekday,c,<span style="color:#0000ff">null</span>)) weekday, <span style="color:#ff00ff">MAX</span>(<span style="color:#0000ff">IF</span>(<span style="color:#808080">NOT</span> weekday,c,<span style="color:#0000ff">null</span><span style="color:#000000">)) weekend , ANY_VALUE (repos) repos </span><span style="color:#0000ff">FROM</span><span style="color:#000000"> ( </span><span style="color:#0000ff">SELECT</span> JSON_EXTRACT_SCALAR (payload, <span style="color:#ff0000">'</span><span style="color:#ff0000">$.pull_request.head.repo.language</span><span style="color:#ff0000">'</span><span style="color:#000000">) lang , EXTRACT (DAYOFWEEK </span><span style="color:#0000ff">FROM</span> created_at) <span style="color:#808080">BETWEEN</span> <strong>2</strong> <span style="color:#808080">AND</span> <strong>6</strong><span style="color:#000000"> weekday , APPROX_TOP_COUNT (repo.name, </span><strong>3</strong><span style="color:#000000">) repos , </span><span style="color:#ff00ff">COUNT</span>(<span style="color:#0000ff">DISTINCT</span><span style="color:#000000"> actor.id) c </span><span style="color:#0000ff">FROM</span> `githubarchive.<span style="color:#ff00ff">year</span>.<strong>2016</strong><span style="color:#000000">` </span><span style="color:#0000ff">WHERE</span> type<span style="color:#808080">=</span><span style="color:#ff0000">'</span><span style="color:#ff0000">PullRequestEvent</span><span style="color:#ff0000">'</span> <span style="color:#0000ff">GROUP</span> <span style="color:#0000ff">BY</span> <strong>1</strong>,<strong>2</strong><span style="color:#000000"> ) </span><span style="color:#0000ff">GROUP</span> <span style="color:#0000ff">BY</span> <strong>1</strong></pre> <p>备注:这个帖子中,作者主要使用了最新的 <a href="/misc/goto?guid=4959005040630075361">GHTorrent import on BigQuery</a>,也许这些资料对读者也有用,<a href="/misc/goto?guid=4959005040727910407">GitHub Archive</a>、<a href="/misc/goto?guid=4959005040820966489">GitHub repos on BigQuery</a>、<a href="/misc/goto?guid=4959005040820966489">Stack Overflow on BigQuery</a>。</p> <p>原文地址:<a href="/misc/goto?guid=4959005040925947037">The top weekend programming languages — based on GitHub’s activity</a></p> <p><a href="/misc/goto?guid=4959005040820966489">40</a>: <a href="/misc/goto?guid=4959005041030013558">https://medium.com/google-cloud/github-on-bigquery-analyze-all-the-code-b3576fd2b150#.4wpjnqty9</a><a href="/misc/goto?guid=4959005040630075361">http://ghtorrent.org/gcloud.html</a></p> <p>来自: <a href="/misc/goto?guid=4959005041125892518" id="link_source2">InfoQ</a></p>