Mechanize - 用Python实现有状态的可编程Web浏览器
admin 13年前
Mechanize提供了一个有状态的可编程Web浏览器。它支持HTML表单填充,自定义Headers,HTTP认证,和SSL支持等。 <br /> <br /> 项目地址: <a href="/misc/goto?guid=4958185840491855782">http://wwwsearch.sourceforge.net/mechanize/</a> <br /> <br /> <pre class="brush:python; toolbar: true; auto-links: false;">import re import mechanize br = mechanize.Browser() br.open("http://www.example.com/") # follow second link with element text matching regular expression response1 = br.follow_link(text_regex=r"cheese\s*shop", nr=1) assert br.viewing_html() print br.title() print response1.geturl() print response1.info() # headers print response1.read() # body br.select_form(name="order") # Browser passes through unknown attributes (including methods) # to the selected HTMLForm. br["cheeses"] = ["mozzarella", "caerphilly"] # (the method here is __setitem__) # Submit current form. Browser calls .close() on the current response on # navigation, so this closes response1 response2 = br.submit() # print currently selected form (don't call .submit() on this, use br.submit()) print br.form response3 = br.back() # back to cheese shop (same data as response1) # the history mechanism returns cached response objects # we can still use the response, even though it was .close()d response3.get_data() # like .seek(0) followed by .read() response4 = br.reload() # fetches from server for form in br.forms(): print form # .links() optionally accepts the keyword args of .follow_/.find_link() for link in br.links(url_regex="python.org"): print link br.follow_link(link) # takes EITHER Link instance OR keyword args br.back()</pre>