Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Critique my code

Name: Anonymous 2011-07-26 4:31

Hello /prog/, I'm a complete beginner and I just wrote the longest script I have yet written. It's not finished, but would you please critique it and tell me if it's ugly or not?


#########################################
# g.e-hentai.org Gallery Scraper v0.1   #  
# by VitamnP                            #
# ENTERPRISE QUALITY                    #
#########################################

import urllib2
import time
from math import ceil
from BeautifulSoup import BeautifulSoup

def requestPage(url):
    # Define user agent to be used in requests
    agent = {'User-Agent':'Uzbl (Webkit 1.3) (Linux i686 [i686])'}
   
    # Request page from server for parsing, identifying as a browser so we don't
    # get banned
    request = urllib2.Request(url, data=None, headers=agent)
    response = urllib2.urlopen(request)
    info = response.info()
    page = response.read()
   
    # Create BeautifulSoup object from page for parsing if the page type is HTML
    soup = BeautifulSoup(page)

    return soup

def getInfo(url):
    # Request page and prepare for parsing
    soup = requestPage(url)   
   
    # Get number of images and total size in MB
    image_string = soup.find(text='Images:')
    info_string = image_string.findNext(text=True)
   
    # Determine number of thumbnail pages and generate URLs
    num_images = float(info_string.split(' ')[0])
    num_pages = ceil(num_images / 20.0) - 1
    pages_list = []
    while num_pages >= 0:
        pages_list.append(url + '?p=' + str(num_pages))
        num_pages -= 1
   
    return {'pages_list' : pages_list, 'info_string' : info_string}
   
def getImagePageLinks(pages_list):
    # Create a list to store image page URLs
    all_page_links = []
   
    # Go through each thumbnail page link in the list, extract the links to
    # the image pages, and store them in a list
    for link in pages_list:
        soup = requestPage(link)
        image_links = soup.findAll('div', {"class" : "gdtm"})
        for link in image_links:
            all_page_links.append(link.findNext('a')['href'])

    return all_page_links
   
def getAllImageLinks(all_page_links, wait_time_secs):
    # Create a list to store the image URLs
    all_image_links = []

    # For each URL in all_image_links, request the page from the server and save
    # the image on the page to a file in the directory specified by the user
    for link in all_image_links:
        soup = requestPage(link)
        topframe = soup.find('iframe')
        imgurl = topframe.findNext('img')['src']
        all_images.append(imgurl)
        time.sleep(wait_time_secs)

    return all_image_links

def download(image_links, save_dir):
   
    for image in image_links:
        filename = save_dir + image.split('/')[-1]
        f = open(filename, 'wb')
        f.write(data)
        f.close()
       
       
   
       
def main():
    print 'g.e-hentai.org Gallery Scraper v0.1'
    print "by VitamnP"
   
    gallery_url = raw_input("Enter gallery URL: \n")
    save_dir = raw_input("Enter save directory: \n")
    wait_time_secs = input("Seconds to wait between page loads: \n")
   
    info = getInfo(gallery_url)
    print "Gallery contains ", info['info_string']

    all_page_links = getImagePageLinks(info['pages_list'])
    all_image_links = getAllImageLinks(all_page_links, wait_time_secs)

    download(all_image_links, save_dir)
   
if __name__ == "__main__":
    main()

Name: Anonymous 2011-07-26 18:30

- Everything you write will be open source. No FASLs, DLLs or EXEs. There may be some very important instances where a business wouldn't want anybody to see the internal implementation of their modules and having strict control over levels of access are necessary. Python third-party library licensing is overly complex. Licenses like MIT allow you to create derived works as long as you maintain attrubution; GNU GPL, or other 'viral' licenses don't allow derived works without inheriting the same license. To inherit the benefits of an open source culture you also inherit the complexities of the licensing hell.
- Installation mentality, Python has inherited the idea that libraries should be installed, so it infact is designed to work inside unix package management, which basically contains a fair amount of baggage (library version issues) and reduced portability. Of course it must be possible to package libraries with your application, but its not conventional and can be hard to deploy as a desktop app due to cross platform issues, language version, etc. Open Source projects generally don't care about Windows, most open source developers use Linux because "Windows sucks".
- Probably the biggest practical problem with Python is that there's no well-defined API that doesn't change. This make life easier for Guido and tough on everybody else. That's the real cause of Python's "version hell".
- Global Interpreter Lock (GIL) is a significant barrier to concurrency. Due to signaling with a CPU-bound thread, it can cause a slowdown even on single processor. Reason for employing GIL in Python is to easy the integration of C/C++ libraries. Additionally, CPython interpreter code is not thread-safe, so the only way other threads can do useful work is if they are in some C/C++ routine, which must be thread-safe.
- Python (like most other scripting languages) does not require variables to be declared, as (let (x 123) ...) in Lisp or int x = 123 in C/C++. This means that Python can't even detect a trivial typo - it will produce a program, which will continue working for hours until it reaches the typo - THEN go boom and you lost all unsaved data. Local and global scopes are unintuitive. Having variables leak after a for-loop can definitely be confusing. Worse, binding of loop indices can be very confusing; e.g. "for a in list: result.append(lambda: fcn(a))" probably won't do what you think it would. Why nonlocal/global/auto-local scope nonsense?
- Python indulges messy horizontal code (> 80 chars per line), where in Lisp one would use "let" to break computaion into manageable pieces. Get used to things like self.convertId([(name, uidutil.getId(obj)) for name, obj in container.items() if IContainer.isInstance(obj)])
- Crippled support for functional programming. Python's lambda is limited to a single expression and doesn't allow conditionals, a side effect of Python making a distinction between expressions and statements. Assignments are not expressions. Most useful high-order functions were deprecated in Python 3.0 and have to be imported from functools. No continuations or even tail call optimization: "I don't like reading code that was written by someone trying to use tail recursion." --Guido
- Python's syntax, based on SETL language and mathematical Set Theory, is non-uniform, hard to understand and parse, compared to simpler languages, like Lisp, Smalltalk, Nial and Factor. Instead of usual "fold" and "map" functions, Python uses "set comprehension" syntax, which has an overhelmingly large collection of underlying linguistic and notational conventions, each with it's own variable binding semantics. To complicate things even more, Python uses the so called "off-side" indentation rule (aka Forced Indentation of Code), also taken from a math-intensive Haskell language. This, in effect, makes Python look like an overengineered toy for math geeks.
- Quite quirky: triple-quoted strings seem like a syntax-decision from a David Lynch movie, and double-underscores, like __init__, seem appropriate in C, but not in a language that provides list comprehensions. There has to be a better way to mark certain features as internal or special than just calling it __feature__.
- Python is unintuitive and has too many confusing non-orthogonal features: references can't be used as hash keys; expressions in default arguments are calculated when the function is defined, not when it’s called. Why have both dictionaries and objects? Why have both types and duck-typing? Why is there ":" in the syntax if it almost always has a newline after it?
- Python's garbage collection uses naive reference counting, which is slow and doesn't handle circular references, meaning you have to expect subtle memory leaks and can't easily use arbitrary graphs as your data. In effect Python complicates even simple tasks, like keeping directory tree with symlinks.
- Problems with arithmetic: no Numerical Tower (nor even rational/complex numbers), meaning 1/2 would produce 0, instead of 0.5, leading to subtle and dangerous errors.
- Poor UTF support and unicode string handling is somewhat awkward.
- self everywhere can make you feel like OO was bolted on, even though it wasn't.
- No outstanding feature, that makes the language, like the brevity of APL or macros of Lisp.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List