山寨版的VeryCD SimpleCD
fmms
13年前
<h3>SimpleCD是什么?</h3> <ul> <li>是山寨化VeryCD的全套工具,包括<strong>抓取脚本</strong>,<strong>网站代码</strong>等 </li> </ul> <h3>谁需要使用SimpleCD?</h3> <ul> <li>想保存VeryCD链接资源者:别镜像VeryCD了,用这个吧。 </li> <li>想研究爬虫脚本和python语法者:其实写得挺烂的,勉强能用而已。 </li> <li>想研究web.py+sqlite3网站架设的爱好者:说学习是抬举我了,一周以前我也既不懂web.py 又不懂sql数据库。 </li> <li>想测试自己虚拟主机性能者:没错,毕竟是1G的数 据库,能跑而且能跑得顺畅的均非等闲主机 </li> </ul> <h3>SimpleCD长啥 样子?</h3> <ul> <li>simplecd架设完毕的例子:http://www.simplecd.org </li> </ul> <h3>为什么用web.py?</h3> <ul> <li>抓网站用的是python,用python系的框架能更好的整合 </li> <li>比较了一下django和web.py,更喜欢web.py那种“用python写网站”,而不是django那 种“用django写网站”的风格。 </li> </ul> <h3>为什么使用 sqlite做数据库?</h3> <ul> <li>最开始是因为python自带,简单 易用 </li> <li>现在是因为实际表现比mysql好10倍:http://obmem.com/?p=317 </li> <li>sqlite的弱点是高并发可能会锁死数据库,但是我已经找到解决方案,就是等什么时候有空研究一下怎么写代码。 </li> </ul> <h3>其他</h3> <ul> <li>我的个人主页,有源码的实现细节,欢迎来踩:http://obmem.com </li> <li>博客中 VeryCD相关: http://obmem.com/?tag=verycd </li> <li><strong>更直观的架设攻略请参考视频: http://www.simplecd.org/static/tutorial.html </strong></li> </ul> <h1>SimpleCD使用方法</h1> <h2>1.需求:</h2> <p>所有可以架设web.py的地方,例如:</p> <ul> <li>一个VPS(Virtual Dedicated Server)(参考Xen和OpenVZ测试(附VPS推荐)) </li> <li>一 个支持web.py的国外共享主机(例如dreamhost架设web.py攻略) </li> <li>一个支持web.py的国内共享主机(例如stdyun.com 架设web.py攻略) </li> </ul> <p>推荐配置:</p> <ul> <li>Xen VPS 需要至少768MB内存的Linux VPS </li> <li>OpenVZ VPS 需要Burstable内存至少512MB内存的Linux VPS,基本内存可以小一点没问题。 </li> </ul> <p>内 存太少的解决方法:</p> <ul> <li>修改nginx/spawn-fcgi.sh中"-F 2"改为"-F 1",只使用一个守护进程 </li> <li>重 新写一个资源占用较低的框架来存取sqlite3。sqlite3直接存取占内存不大。 </li> <li><strong>不要</strong>试图 用mysql来取代sqlite,mysql效率更低 </li> </ul> <p>本教程基于操作系统Ubuntu 9.04 由于玩VPS的都非善类,相信其他操作系统的架设都能自己解决</p> <h2>2.修改软件源</h2> <p>我们要用新软件,所以直接修改/etc/apt/sources.list 把其中的jaunty改为karmic,用9.10的软件源 :)</p> <p>然后更新一下</p> <pre class="prettyprint"><span class="pln">apt</span><span class="pun">-</span><span class="kwd">get</span><span class="pln"> update</span></pre> <p>接下来分别安装nginx,spawn-fcgi,和mercurial</p> <pre class="prettyprint"><span class="pln">apt</span><span class="pun">-</span><span class="kwd">get</span><span class="pln"> install nginx apt</span><span class="pun">-</span><span class="kwd">get</span><span class="pln"> install spawn</span><span class="pun">-</span><span class="pln">fcgi apt</span><span class="pun">-</span><span class="kwd">get</span><span class="pln"> install mercurial</span></pre> <p>再接下来是easy_install的安装,以及安装web.py和flup</p> <pre class="prettyprint"><span class="pln">apt</span><span class="pun">-</span><span class="kwd">get</span><span class="pln"> install python</span><span class="pun">-</span><span class="pln">setuptools easy_install web</span><span class="pun">.</span><span class="pln">py easy_install flup</span></pre> <h2>3. 简易架设攻略</h2> <p>下载源码</p> <pre class="prettyprint"><span class="pln">cd </span><span class="pun">/</span><span class="kwd">var</span><span class="pun">/</span><span class="pln">www hg clone https</span><span class="pun">:</span><span class="com">//simplecd.googlecode.com/hg simplecd</span><span class="pln"> cd simplecd hg update deployment</span></pre> <p>接下来做一些基本的配置</p> <pre class="prettyprint"><span class="com">#创建数据库</span><span class="pln"> </span><span class="pun">./</span><span class="pln">fetchvc</span><span class="pun">.</span><span class="pln">py createdb </span><span class="com">#nginx的配置文件(请根据视频进行相应修改)</span><span class="pln"> cp nginx</span><span class="pun">/</span><span class="pln">nginx</span><span class="pun">.</span><span class="pln">conf </span><span class="pun">/</span><span class="pln">etc</span><span class="pun">/</span><span class="pln">nginx</span><span class="pun">/</span><span class="pln"> cp nginx</span><span class="pun">/</span><span class="pln">simplecd </span><span class="pun">/</span><span class="pln">etc</span><span class="pun">/</span><span class="pln">nginx</span><span class="pun">/</span><span class="pln">site</span><span class="pun">-</span><span class="pln">available</span><span class="pun">/</span><span class="pln"> ln </span><span class="pun">-</span><span class="pln">s </span><span class="pun">/</span><span class="pln">etc</span><span class="pun">/</span><span class="pln">nginx</span><span class="pun">/</span><span class="pln">site</span><span class="pun">-</span><span class="pln">available</span><span class="pun">/</span><span class="pln">simplecd </span><span class="pun">/</span><span class="pln">etc</span><span class="pun">/</span><span class="pln">nginx</span><span class="pun">/</span><span class="pln">site</span><span class="pun">-</span><span class="pln">enabled</span><span class="pun">/</span><span class="pln">simplecd</span><span class="com">#用spawn-fcgi开fcgi</span><span class="pln"> nginx</span><span class="pun">/</span><span class="pln">spawn</span><span class="pun">-</span><span class="pln">fcgi</span><span class="pun">.</span><span class="pln">sh</span><span class="com">#开启nginx服务</span><span class="pln"> </span><span class="pun">/</span><span class="pln">etc</span><span class="pun">/</span><span class="pln">init</span><span class="pun">.</span><span class="pln">d</span><span class="pun">/</span><span class="pln">nginx start</span></pre> <p>好了,大功告成,访问vps的地址看看,应该已经架设完毕了</p> <h2>4.simplecd的使用</h2> <h3>一些数 据库的更新方法:</h3> <p>上一步中的数据库还是空的,必须下载数据库,数据库更新方法如下</p> <pre class="prettyprint"><span class="pun">./</span><span class="pln">fetchvc</span><span class="pun">.</span><span class="pln">py feed </span><span class="com">#按照feed更新数据库</span><span class="pln"> </span><span class="pun">./</span><span class="pln">fetchvc</span><span class="pun">.</span><span class="pln">py update </span><span class="com">#更新主页的前20页数据</span><span class="pln"> </span><span class="pun">./</span><span class="pln">fetchvc</span><span class="pun">.</span><span class="pln">py fetch q</span><span class="pun">=海猫</span><span class="pln"> </span><span class="com">#在verycd搜索所有关于海猫的内容并更新到数据库</span><span class="pln"> </span><span class="pun">./</span><span class="pln">fetchvc</span><span class="pun">.</span><span class="pln">py fetch </span><span class="typ">TopicID</span><span class="pln"> </span><span class="com">#直接更新topicid</span><span class="pln"> </span><span class="pun">./</span><span class="pln">fetchvc</span><span class="pun">.</span><span class="pln">py fetchall </span><span class="com">#更新全部数据库,建议还是不要尝试为好</span><span class="pln"> </span><span class="pun">./</span><span class="pln">fetchvc</span><span class="pun">.</span><span class="pln">py fetch </span><span class="lit">1000</span><span class="pun">-</span><span class="lit">1001</span><span class="pln"> </span><span class="com">#更新verycd的archives页面第1000页到1001页的内容</span></pre> <h3>下载全数据库(截止2009.12.18)</h3> <p>eMule链接:</p> <p>ed2k://%7Cfile%7Cverycd.sqlite3.db.lzma%7C233121378%7C0fd38cff1353e996576f9f3e9b8c65dd%7C</p> <p>解压: lzma -d verycd.sqlite3.db.lzma</p> <p>然后放入 simplecd目录即可</p> <h3>设置自动更新</h3> <p>想让simplecd自动和VeryCD保持同步?</p> <p>试试看 default branch的scdd.py:</p> <pre class="prettyprint"><span class="pln">hg update </span><span class="kwd">default</span><span class="pln"> python scdd</span><span class="pun">.</span><span class="pln">py start</span></pre> <p>每隔15分钟看一下,如果成功的话应该已经有自动更新了</p> <h3>为什么simplecd.org的主页和deployment不一 致?</h3> <p>simplecd.org上有 些特殊的设置,所以我没有让它与本源代码同步,而是同步到另一个目录,作出一些调整,然后复制到目标目录。</p> <p>要尝试新界面和新功能你可以试试看dev branch:</p> <pre class="prettyprint"><span class="pln">hg update dev</span></pre> <p><strong>注意</strong>:最新的dev branch使用了mysql数据库,sqlite到mysql的转换可见conf.py的注释部分。</p> <p><strong>注意2*:mysql性能可能会极烂,如果有2G 以上内存那么可以考虑修改my.cnf至它默认的huge站的配置文件 </strong></p> <br /> <p><strong>项目主页:</strong><a href="http://www.open-open.com/lib/view/home/1327997016389" target="_blank">http://www.open-open.com/lib/view/home/1327997016389</a></p>