Mechanize - 用Python实现有状态的可编程Web浏览器

admin 13年前
     Mechanize提供了一个有状态的可编程Web浏览器。它支持HTML表单填充,自定义Headers,HTTP认证,和SSL支持等。    <br />    <br /> 项目地址:    <a href="/misc/goto?guid=4958185840491855782">http://wwwsearch.sourceforge.net/mechanize/</a>    <br />    <br />    <pre class="brush:python; toolbar: true; auto-links: false;">import re  import mechanize    br = mechanize.Browser()  br.open("http://www.example.com/")  # follow second link with element text matching regular expression  response1 = br.follow_link(text_regex=r"cheese\s*shop", nr=1)  assert br.viewing_html()  print br.title()  print response1.geturl()  print response1.info()  # headers  print response1.read()  # body    br.select_form(name="order")  # Browser passes through unknown attributes (including methods)  # to the selected HTMLForm.  br["cheeses"] = ["mozzarella", "caerphilly"]  # (the method here is __setitem__)  # Submit current form.  Browser calls .close() on the current response on  # navigation, so this closes response1  response2 = br.submit()    # print currently selected form (don't call .submit() on this, use br.submit())  print br.form    response3 = br.back()  # back to cheese shop (same data as response1)  # the history mechanism returns cached response objects  # we can still use the response, even though it was .close()d  response3.get_data()  # like .seek(0) followed by .read()  response4 = br.reload()  # fetches from server    for form in br.forms():  print form  # .links() optionally accepts the keyword args of .follow_/.find_link()  for link in br.links(url_regex="python.org"):  print link      br.follow_link(link)  # takes EITHER Link instance OR keyword args      br.back()</pre>