Hello /prog/, I'm a complete beginner and I just wrote the longest script I have yet written. It's not finished, but would you pleasecritique it and tell me if it's ugly or not?
import urllib2
import time
from math import ceil
from BeautifulSoup import BeautifulSoup
def requestPage(url):
# Define user agent to be used in requests
agent = {'User-Agent':'Uzbl (Webkit 1.3) (Linux i686 [i686])'}
# Request page from server for parsing, identifying as a browser so we don't
# get banned
request = urllib2.Request(url, data=None, headers=agent)
response = urllib2.urlopen(request)
info = response.info()
page = response.read()
# Create BeautifulSoup object from page for parsing if the page type is HTML
soup = BeautifulSoup(page)
return soup
def getInfo(url):
# Request page and prepare for parsing
soup = requestPage(url)
# Get number of images and total size in MB
image_string = soup.find(text='Images:')
info_string = image_string.findNext(text=True)
# Determine number of thumbnail pages and generate URLs
num_images = float(info_string.split(' ')[0])
num_pages = ceil(num_images / 20.0) - 1
pages_list = []
while num_pages >= 0:
pages_list.append(url + '?p=' + str(num_pages))
num_pages -= 1
def getImagePageLinks(pages_list):
# Create a list to store image page URLs
all_page_links = []
# Go through each thumbnail page link in the list, extract the links to
# the image pages, and store them in a list
for link in pages_list:
soup = requestPage(link)
image_links = soup.findAll('div', {"class" : "gdtm"})
for link in image_links:
all_page_links.append(link.findNext('a')['href'])
return all_page_links
def getAllImageLinks(all_page_links, wait_time_secs):
# Create a list to store the image URLs
all_image_links = []
# For each URL in all_image_links, request the page from the server and save
# the image on the page to a file in the directory specified by the user
for link in all_image_links:
soup = requestPage(link)
topframe = soup.find('iframe')
imgurl = topframe.findNext('img')['src']
all_images.append(imgurl)
time.sleep(wait_time_secs)
return all_image_links
def download(image_links, save_dir):
for image in image_links:
filename = save_dir + image.split('/')[-1]
f = open(filename, 'wb')
f.write(data)
f.close()
So many comments, not even a docstring, claims to be ENTERPRISE, but there's not even a factory, nor a singleton. append is slow as fuck, but FIOC doesn't use linked lists. Too many temporary variables make the bodies of the procedures heavier than they should be.
Also, THE FORCED INDENTATION OF CODE.
THREAD OVER
Great size. Look thick. Solid. Tight. Keep us all posted on your continued progress with any new progress pics or vid clips. Show us what you got man. Wanna see how freakin' huge, solid, thick and tight you can get. Thanks for the motivation.
>>1
Learn to avoid pointless comments like
# Create BeautifulSoup object from page for parsing if the page type is HTML
soup = BeautifulSoup(page)
Though it's perfectly fine for someone who only begins to code, like a small child who constantly tells herself what she is going to do, the sooner you learn to internalize this stuff, the better.
Stuff like:
pages_list = []
while num_pages >= 0:
pages_list.append(url + '?p=' + str(num_pages))
num_pages -= 1
Is better written as:
pages_list = ['%s?p=%d' % (url, i) for i in xrange(num_pages)]
Also, the creator of BeautifulSoup himself recommends using lxml.
At requestPage:
Why assign soup? Why not just return BeautifulSoup(page) ?
Also:
def getAllImageLinks(all_page_links, wait_time_secs):
# Create a list to store the image URLs
all_image_links = []
# For each URL in all_image_links, request the page from the server and save
# the image on the page to a file in the directory specified by the user
for link in all_image_links:
soup = requestPage(link)
topframe = soup.find('iframe')
imgurl = topframe.findNext('img')['src']
all_images.append(imgurl)
time.sleep(wait_time_secs)
return all_image_links
But all_image_links is empty, thus it won't return shite.
Name:
Anonymous2011-07-26 8:59
It would really improve the performance if it was multi-threaded.
Also, ose the os.* functions: filename = os.path.join(save_dir, os.path.basename(image))
Name:
Anonymous2011-07-26 18:30
- Everything you write will be open source. No FASLs, DLLs or EXEs. There may be some very important instances where a business wouldn't want anybody to see the internal implementation of their modules and having strict control over levels of access are necessary. Python third-party library licensing is overly complex. Licenses like MIT allow you to create derived works as long as you maintain attrubution; GNU GPL, or other 'viral' licenses don't allow derived works without inheriting the same license. To inherit the benefits of an open source culture you also inherit the complexities of the licensing hell.
- Installation mentality, Python has inherited the idea that libraries should be installed, so it infact is designed to work inside unix package management, which basically contains a fair amount of baggage (library version issues) and reduced portability. Of course it must be possible to package libraries with your application, but its not conventional and can be hard to deploy as a desktop app due to cross platform issues, language version, etc. Open Source projects generally don't care about Windows, most open source developers use Linux because "Windows sucks".
- Probably the biggest practical problem with Python is that there's no well-defined API that doesn't change. This make life easier for Guido and tough on everybody else. That's the real cause of Python's "version hell".
- Global Interpreter Lock (GIL) is a significant barrier to concurrency. Due to signaling with a CPU-bound thread, it can cause a slowdown even on single processor. Reason for employing GIL in Python is to easy the integration of C/C++ libraries. Additionally, CPython interpreter code is not thread-safe, so the only way other threads can do useful work is if they are in some C/C++ routine, which must be thread-safe.
- Python (like most other scripting languages) does not require variables to be declared, as (let (x 123) ...) in Lisp or int x = 123 in C/C++. This means that Python can't even detect a trivial typo - it will produce a program, which will continue working for hours until it reaches the typo - THEN go boom and you lost all unsaved data. Local and global scopes are unintuitive. Having variables leak after a for-loop can definitely be confusing. Worse, binding of loop indices can be very confusing; e.g. "for a in list: result.append(lambda: fcn(a))" probably won't do what you think it would. Why nonlocal/global/auto-local scope nonsense?
- Python indulges messy horizontal code (> 80 chars per line), where in Lisp one would use "let" to break computaion into manageable pieces. Get used to things like self.convertId([(name, uidutil.getId(obj)) for name, obj in container.items() if IContainer.isInstance(obj)])
- Crippled support for functional programming. Python's lambda is limited to a single expression and doesn't allow conditionals, a side effect of Python making a distinction between expressions and statements. Assignments are not expressions. Most useful high-order functions were deprecated in Python 3.0 and have to be imported from functools. No continuations or even tail call optimization: "I don't like reading code that was written by someone trying to use tail recursion." --Guido
- Python's syntax, based on SETL language and mathematical Set Theory, is non-uniform, hard to understand and parse, compared to simpler languages, like Lisp, Smalltalk, Nial and Factor. Instead of usual "fold" and "map" functions, Python uses "set comprehension" syntax, which has an overhelmingly large collection of underlying linguistic and notational conventions, each with it's own variable binding semantics. To complicate things even more, Python uses the so called "off-side" indentation rule (aka Forced Indentation of Code), also taken from a math-intensive Haskell language. This, in effect, makes Python look like an overengineered toy for math geeks.
- Quite quirky: triple-quoted strings seem like a syntax-decision from a David Lynch movie, and double-underscores, like __init__, seem appropriate in C, but not in a language that provides list comprehensions. There has to be a better way to mark certain features as internal or special than just calling it __feature__.
- Python is unintuitive and has too many confusing non-orthogonal features: references can't be used as hash keys; expressions in default arguments are calculated when the function is defined, not when it’s called. Why have both dictionaries and objects? Why have both types and duck-typing? Why is there ":" in the syntax if it almost always has a newline after it?
- Python's garbage collection uses naive reference counting, which is slow and doesn't handle circular references, meaning you have to expect subtle memory leaks and can't easily use arbitrary graphs as your data. In effect Python complicates even simple tasks, like keeping directory tree with symlinks.
- Problems with arithmetic: no Numerical Tower (nor even rational/complex numbers), meaning 1/2 would produce 0, instead of 0.5, leading to subtle and dangerous errors.
- Poor UTF support and unicode string handling is somewhat awkward.
- self everywhere can make you feel like OO was bolted on, even though it wasn't.
- No outstanding feature, that makes the language, like the brevity of APL or macros of Lisp.
Name:
Anonymous2011-07-26 18:38
>>20
When I get off work I'll debunk every single fucking statement in this awful, ultimately destructive troll post.
>>15
Thanks for pointing this out, I changed some of the variable names to make it less confusing but I ended up making things worse. I think using so many temp variables must be a bad idea
>>11
The comments thing is what you thought, if I write the comment it helps me think of what code I have to write to make the effect I want. I'll just keep practising and I expect I won't have to do it after a while. About replacing the while loop with a list comprehension, I was roasted for doing that in the Python IRC channel? Which is better? They also told me not to use recursion which was disappointing because after I found out what recursion was I thought it was so cool I rewrote all my little programs to use it as much as possible.
>>20
Don't understand any of this except for not having to declare variables, which does seem like kind of a bad idea
Thanks for the advice and trolling /prog/, sorry if I made you cringe
>>23
Ignore the retard haters, use recursion, list comprehensions and higher-order functions (map, filter, etc.) whenever you feel it's clearer to express things that way (i.e. most of the time). Really. And you should learn some other languages too: C|Pascal + Lua|Javascript|Lisp + assembly. Also, you must master the concepts of lexical scoping and closures (hint: to play with these, use Lua|Javascript|Lisp). Oh, and don't forget to read up about things like the Halting problem and P vs NP.
Name:
Anonymous2011-07-26 21:01
>>26
filthy academic. None of your advice will allow anyone to become a true enterprise-level developer.
enterprise-level developers don't need to know about P vs NP? What if they encounter an NP problem while developing and spend time trying to make a polynomial algorithm for it?