使用 Postgres 递归公共表表达式解决旅行销售员问题

jopen 9年前

原文 http://segmentfault.com/a/1190000003739666

原文: Solving the Traveling Salesman Problem with Postgres Recursive CTEs

Many SQL implementations don't have loops, making some kinds of analysis very difficult. Postgres, SQL Server, and several others have the next best thing — recursive CTEs!

许多 SQL 实现没有将循环包括在内,使进行一些分析工作变得很困难。Postgres、SQL Server 以及一些其它的 SQL 实现则提供了下面将要提到的优良特性:递归公共表表达式(recursive CTE)。

We'll use them to solve the Traveling Salesman Problem and find the shortest round-trip route through several US cities.

我们将使用它们来解决旅行销售员问题( Traveling Salesman Problem )并且找出经过若干美国城市的最短巡回路径。

使用 Postgres 递归公共表表达式解决旅行销售员问题

使用递归

Normal CTEs are great at helping to organize large queries. They are a simple way to make temporary tables you can access later in your query.

一般的公共表表达式(CTE)主要用于帮助组织大型查询语句。你可以简便地创建临时表,并在稍后的查询语句中访问它们。

Recursive CTEs are more powerful - they reference themselves and allow you to explore hierarchical data. While that may sound complicated, the underlying concept is very similar to aforloop in other programming languages.

递归公共表表达式( Recursive CTEs )的能力更强——它们通过自引用,使你能够探索层次化的数据。虽然这听起来很复杂,但其背后的概念和其它编程语言中的for循环十分相似。

These CTEs have two parts — ananchormember and arecursivemember. Theanchormember selects the starting rows for the recursive steps.

该公共表表达式(CTE)包含两个部分——一个 anchor 部分和一个 recursive 部分。 anchor 部分提供了递归步骤的起始数据。

Therecursivemember generates more rows for the CTE by first joining against theanchorrows, and then joining against rows created in previous recursions. Therecursivemember comes after aunion allin the CTE definition.

recursive部分为此公共表表达式(CTE)生成更多的数据,方法是先连接(join) anchor 数据,然后连接(join)上次递归产生的数据。在公共表表达式(CTE)的定义语句中, recursive 部分接在union all关键字后面。

Here's a simple recursive CTE that generates the numbers 1 to 10. Theanchormember selects the value 1, and therecursivemember adds to it up to the number 10:

这是一个能产生数字1到10的简单的递归公共表表达式(CTE)。其中 anchor 部分提供了数据值1,然后 recursive 部分将其逐步累加至10。

with recursive incrementer(prev_val) as (    select 1 -- anchor member    union all    select -- recursive member      incrementer.prev_val + 1    from incrementer    where prev_val < 10 -- termination condition  )    select * from incrementer

The first time the recursive CTE runs it generates a single row1using theanchormember. In the second execution, therecursivemember joins against the1and outputs a second row,2. In the third execution therecursivestep joins against both rows1and2and adds the rows2(a duplicate) and3.

该递归公共表表达式(recursive CTE)在第一次执行的时候根据 anchor 部分生成了一行数据“1”。在第二次执行中, recursive 部分连接到数据“1”并输出了第二行“2”。在第三次执行中, recursive 部分同时连接到了数据“1”和“2”并且添加了新数据“2”(重复)和“3”。

Recursive CTEs also only return distinct rows. Even though our CTE above creates many rows with the same value, only a distinct set of rows will be returned.

同时递归公共表表达式(recursive CTE)只返回互不相同的数据。虽然我们的公共表表达式(CTE)创建了许多相同的数据行,但返回的数据集只包含互不相同的数据。

Notice how the CTE specifies its output as the named valueprev_val. This lets us refer to the output of the previous recursive step.

注意该公共表表达式(CTE)是如何将其输出命名为prev_val的。这使得我们可以引用上一次递归的输出结果。

And at the very end there is a termination condition to halt the recursion once the sum gets to 10. Without this condition, the CTE would enter an infinite loop!

并且在最后的最后有一个终止条件:一旦 sum 到达10就停止递归。如果没有这个条件,该公共表表达式(CTE)将会进入一个无限循环!

Under the hood, the database is building up a table named after this recursive CTE using unions:

这样,数据库以该递归公共表表达式(recursive CTE)为名字,基于并集建立了一个数据表:

使用 Postgres 递归公共表表达式解决旅行销售员问题

Recursive CTEs can also have many parameters. Here's one that takes the sum, double, and square of starting values of 1, 2 and 3:

递归公共表表达式(recursive CTE)还可以包含多个参数。下面的例子以1、2和3为初始值,分别计算了依次加1的和、倍增值和依次平方的值。

with recursive cruncher(inc, double, square) as (    select 1, 2.0, 3.0 -- anchor member    union all    select -- recursive member      cruncher.inc + 1,      cruncher.double * 2,      cruncher.square ^ 2    from cruncher    where inc < 10  )    select * from cruncher

With recursive CTEs we can solve the Traveling Salesman Problem.

通过使用递归公共表表达式(recursive CTE),我们能够解决旅行销售员问题。

找出最短路径

There are many algorithms for finding the shortest round-trip path through several cities. We'll use the simplest: brute force. Our recursive CTE will enumerate all possible routes and their total distances. We'll then sort to find the shortest.

有许多算法可以用于找出经过若干城市的最短巡回路径。我们将使用最简单的一种:暴力搜索。我们的递归公共表表达式(recursive CTE)将枚举所有可能的路径和它们的总距离,然后排序以找出最短的一条。

First, a list of cities with Periscope customers, along with their latitudes and longitudes:

首先是一个顾客所在城市的列表,包含它们的纬度和经度。

create table places as (    select      'Seattle' as name, 47.6097 as lat, 122.3331 as lon      union all select 'San Francisco', 37.7833, 122.4167      union all select 'Austin', 30.2500, 97.7500      union all select 'New York', 40.7127, 74.0059      union all select 'Boston', 42.3601, 71.0589      union all select 'Chicago', 41.8369, 87.6847      union all select 'Los Angeles', 34.0500, 118.2500      union all select 'Denver', 39.7392, 104.9903  )

And we'll need a distance function to compute how far two lat/lons are from each other (thanks to strkol on stackoverflow.com ):

然后我们需要一个距离函数来计算两个经纬度之间的距离( 感谢 strkol 在 stackoverflow.com 上的回答 ):

create or replace function lat_lon_distance(    lat1 float, lon1 float, lat2 float, lon2 float  ) returns float as $$  declare    x float = 69.1 * (lat2 - lat1);    y float = 69.1 * (lon2 - lon1) * cos(lat1 / 57.3);  begin    return sqrt(x * x + y * y);  end  $$ language plpgsql

Our CTE will use San Francisco as its anchor city, and then recurse from there to every other city:

我们的公共表表达式(CTE)将使用 San Francisco 作为出发城市,然后从那开始递归抵达其它城市。

with recursive travel(places_chain, last_lat, last_lon,      total_distance, num_places) as (    select -- anchor member      name, lat, lon, 0::float, 1      from places      where name = 'San Francisco'    union all    select -- recursive member      -- add to the current places_chain      travel.places_chain || ' -> ' || places.name,      places.lat,      places.lon,      -- add to the current total_distance      travel.total_distance +         lat_lon_distance(last_lat, last_lon, places.lat, places.lon),      travel.num_places + 1    from      places, travel    where      position(places.name in travel.places_chain) = 0  )

The parameters in the CTE are:

  • places_chain: The list of places visited so far, which will be different for each instance of the recursion

  • last_lat and last_lon: The latitude and longitude of the last place in theplaces_chain

  • total_distance: The distance traveled going from one place to the next in theplaces_chain

  • num_places: The number of places inplaces_chain— we'll use this to tell which routes are complete because they visited all cities

该公共表表达式(CTE)中的参数有:

  • places_chain:截至目前访问过的位置的列表,在每条递归路径中都是不同的

  • last_lat and last_lon:places_chain中最后一个位置的纬度和经度。

  • total_distance:places_chain中相邻位置的距离的总和

  • num_places:places_chain中位置的数目——我们使用该参数来分辨哪条路径已经完成,由其访问过了所有城市

In therecursivemember, thewhereclause ensures that we never repeat a place. If we've already visited Denver,position(...)will return a number greater than 0, invalidating this instance of the recursion.

在recursive部分中,where子句确保了我们不会重复访问一个地方。(比如说)如果我们已经访问过 Denver,position(...)将返回一个大于0的数字,使得该递归路径无效化。

We can see all possible routes by selecting all 8-city chains:

通过列出所有包含了8个城市的城市链,我们可以看到所有可能的路径:

select * from travel where num_places = 8

We need to add in the distance from the last city back to San Francisco to complete the round-trip. We could hard code San Francisco's lat/lon, but a join is more elegant. Once that's done we sort by distance and show the smallest:

我们需要加上从最后一个城市回到 San Francisco 的距离以完成回路。我们可以在代码中显式写入 San Francisco 的经纬度,但使用连接操作看起来更加优雅。一完成这个我们就可以根据距离进行排序并输出最短路径:

select    travel.places_chain || ' -> ' || places.name,    total_distance + lat_lon_distance(        travel.last_lat, travel.last_lon,        places.lat, places.lon) as final_dist  from travel, places  where    travel.num_places = 8    and places.name = 'San Francisco'  order by 2 -- ascending!  limit 1

Even though this query is significantly more complicated than theincrementerquery earlier, the database is doing the same things behind the scenes. The top branch is the creating the CTE's rows, the bottom branch is the final join and sort:

虽然该查询语句明显复杂于之前的incrementer查询,数据库在幕后做的事情依然一样。最顶上的分支是创建该公共表表达式(CTE)的数据,最底部的分支是最终的连接和排序。

使用 Postgres 递归公共表表达式解决旅行销售员问题

Run this query and you'll see the shortest route takes 6671 miles and visits the cities in this order:

执行该查询语句,你会看到最短路径需要 6671 英里并且按顺序经过了下列城市:

San Francisco -> Seattle -> Denver ->

Chicago -> Boston -> New York -> Austin ->

Los Angeles -> San Francisco

</div>

Thanks to recursive CTEs, we can solve the Traveling Salesman Problem in SQL!

得益于递归公共表表达式(recursive CTE),我们成功地用 SQL 解决了旅行销售员问题!