Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

programming language(s) for bioinformatics

Name: Anonymous 2011-07-27 15:35

I am a high school sophomore and visited a professor in the molecular biology sciences who is also an expert programmer. (He builds robots, biochips, and a bunch of other cool gadgets.) In discussing other things, he urged me to learn Python programming, and now im very interested in computer science, and progressing rapidly. I know Python is a powerful language for its simple syntax. But if I want to go into bioinformatics (especially to use the tool http://www.ncbi.nlm.nih.gov/BLAST/) and computer programming in general, maybe even study cybersecurity, is it an ideal language to start with? or is C/C++ a better choice?

Name: Anonymous 2011-07-27 15:45

- Everything you write will be open source. No FASLs, DLLs or EXEs. There may be some very important instances where a business wouldn't want anybody to see the internal implementation of their modules and having strict control over levels of access are necessary. Python third-party library licensing is overly complex. Licenses like MIT allow you to create derived works as long as you maintain attrubution; GNU GPL, or other 'viral' licenses don't allow derived works without inheriting the same license. To inherit the benefits of an open source culture you also inherit the complexities of the licensing hell.
- Installation mentality, Python has inherited the idea that libraries should be installed, so it infact is designed to work inside unix package management, which basically contains a fair amount of baggage (library version issues) and reduced portability. Of course it must be possible to package libraries with your application, but its not conventional and can be hard to deploy as a desktop app due to cross platform issues, language version, etc. Open Source projects generally don't care about Windows, most open source developers use Linux because "Windows sucks".
- Probably the biggest practical problem with Python is that there's no well-defined API that doesn't change. This make life easier for Guido and tough on everybody else. That's the real cause of Python's "version hell".
- Global Interpreter Lock (GIL) is a significant barrier to concurrency. Due to signaling with a CPU-bound thread, it can cause a slowdown even on single processor. Reason for employing GIL in Python is to easy the integration of C/C++ libraries. Additionally, CPython interpreter code is not thread-safe, so the only way other threads can do useful work is if they are in some C/C++ routine, which must be thread-safe.
- Python (like most other scripting languages) does not require variables to be declared, as (let (x 123) ...) in Lisp or int x = 123 in C/C++. This means that Python can't even detect a trivial typo - it will produce a program, which will continue working for hours until it reaches the typo - THEN go boom and you lost all unsaved data. Local and global scopes are unintuitive. Having variables leak after a for-loop can definitely be confusing. Worse, binding of loop indices can be very confusing; e.g. "for a in list: result.append(lambda: fcn(a))" probably won't do what you think it would. Why nonlocal/global/auto-local scope nonsense?
- Python indulges messy horizontal code (> 80 chars per line), where in Lisp one would use "let" to break computaion into manageable pieces. Get used to things like self.convertId([(name, uidutil.getId(obj)) for name, obj in container.items() if IContainer.isInstance(obj)])
- Crippled support for functional programming. Python's lambda is limited to a single expression and doesn't allow conditionals, a side effect of Python making a distinction between expressions and statements. Assignments are not expressions. Most useful high-order functions were deprecated in Python 3.0 and have to be imported from functools. No continuations or even tail call optimization: "I don't like reading code that was written by someone trying to use tail recursion." --Guido
- Python's syntax, based on SETL language and mathematical Set Theory, is non-uniform, hard to understand and parse, compared to simpler languages, like Lisp, Smalltalk, Nial and Factor. Instead of usual "fold" and "map" functions, Python uses "set comprehension" syntax, which has an overhelmingly large collection of underlying linguistic and notational conventions, each with it's own variable binding semantics. To complicate things even more, Python uses the so called "off-side" indentation rule (aka Forced Indentation of Code), also taken from a math-intensive Haskell language. This, in effect, makes Python look like an overengineered toy for math geeks.
- Quite quirky: triple-quoted strings seem like a syntax-decision from a David Lynch movie, and double-underscores, like __init__, seem appropriate in C, but not in a language that provides list comprehensions. There has to be a better way to mark certain features as internal or special than just calling it __feature__.
- Python is unintuitive and has too many confusing non-orthogonal features: references can't be used as hash keys; expressions in default arguments are calculated when the function is defined, not when it’s called. Why have both dictionaries and objects? Why have both types and duck-typing? Why is there ":" in the syntax if it almost always has a newline after it?
- Python's garbage collection uses naive reference counting, which is slow and doesn't handle circular references, meaning you have to expect subtle memory leaks and can't easily use arbitrary graphs as your data. In effect Python complicates even simple tasks, like keeping directory tree with symlinks.
- Problems with arithmetic: no Numerical Tower (nor even rational/complex numbers), meaning 1/2 would produce 0, instead of 0.5, leading to subtle and dangerous errors.
- Poor UTF support and unicode string handling is somewhat awkward.
- self everywhere can make you feel like OO was bolted on, even though it wasn't.
- No outstanding feature, that makes the language, like the brevity of APL or macros of Lisp.

Name: quoting myself yay 2011-07-27 15:55

Name: Anonymous 2011-07-27 16:18

Google results:

"Bioinformatics +" ...

Python = 2.780.000
Perl   = 2.990.000

Name: Anonymous 2011-07-27 16:24

Name: Anonymous 2011-07-27 16:33

>>1
C/C++ is NEVER a better choice (except for games or low-level programming)

Python or Perl will do. Any of them.

Name: Anonymous 2011-07-27 16:37

>>1
You can do anything in any Turing-complete language, but Python, C and C++ are probably not the best choice you can do. You should forget that C and C++ exist, actually.

Name: Anonymous 2011-07-27 17:40

lisp

Name: Anonymous 2011-07-27 18:20

>>8
No.

Name: not >>8 2011-07-27 18:26

>>9
There are actually enough bioinformatics-related libraries for CL. I would just tell OP to use whatever he finds comfortable. I find Lisp comfortable and thus I use it for my needs, if you find something else, you use that - of course, you do need to know your fair share of languages before making a choice, otherwise you might as well pick some inappropriate language and keep on using it (and wasting a lot of your time because some languages are good for more rapid development cycles, others are better for performance) because it's the only thing you know.

Name: Anonymous 2011-07-27 19:15

I know C, C++, Java, C#, Python, and Haskell, and I find that I spend most of my time coding in C#.

As >>10 said, learn a language or two and use what you're most comfortable with.

Name: Anonymous 2011-07-27 21:14

>>10
Scheme uses ((lambda (x) (* x x)) 10). It's a critical hit! CL falls to the ground as Scheme pisses then ejaculates over its lifeless body.

Name: Anonymous 2011-07-27 21:15

>>12 No

CL-USER> ((lambda (x) (* x x)) 10)
100

Name: Anonymous 2011-07-27 21:42

>>11-12
Why can't I do this?
((cdr '(+ . -)) 3 2)

Name: Anonymous 2011-07-27 21:43

>>14
I mena >>12-13.

Name: Anonymous 2011-07-27 22:08

>>14
Because you're using a broken Lisp designed by halfwits.  Try Scheme for a change.

Name: Anonymous 2011-07-27 22:15

>>16
guile> ((cdr '(+ . -)) 3 2)

Backtrace:
In standard input:
   1: 0* [- 3 2]

standard input:1:1: In expression ((cdr #) 3 2):
standard input:1:1: Wrong type to apply: -
ABORT: (misc-error)

Name: Anonymous 2011-07-27 22:59

IHBT, but it's
((cdr (cons + -)) 3 2)
The other expression turned + and - into symbols with quote, and you can't just go around applying that shit.

Name: Anonymous 2011-07-27 23:05

>>18
IHBT
No, I just started learning Lisp. Thanks, though.

Name: Anonymous 2011-07-27 23:05

>>18
IHBT
No, I just started learning Lisp. Thanks, though.

Name: Anonymous 2011-07-28 3:42

>>19-20
((eval (cdr '(+ . -))) 3 2)
Don't do that in real code.

Name: Anonymous 2011-07-28 5:28

Name: Anonymous 2011-07-28 7:27

>>18
[1]> ((cdr (cons + -)) 3 2)

*** - EVAL: (CDR (CONS + -)) is not a function name; try using a symbol instead

Name: Anonymous 2011-07-28 8:16

>>23
#;> ((cdr (cons + -)) 3 2)
1

Name: Anonymous 2011-07-28 15:33

>>24
doesn't work in gnu clisp

Name: Anonymous 2011-07-28 16:00

>>25
Works in Chicken, Guile, Racket, Gambit, MIT/GNU Scheme and my metacircular evaluator.

Name: Anonymous 2011-07-28 16:51

I see this degenerated into Lisp-1 vs Lisp-n trolling.

Name: Anonymous 2011-07-28 21:35

>>26
Those are all Scheme implementations, not CL.  Let's face it, aside from the visual appearance of the code, Scheme and CL are entirely different creatures.  Scheme is like C, simple and pure, and CL is like C++, badly designed and bloated.

Name: Anonymous 2011-07-28 22:04

Scheme is like C, simple and pure,
begin-for-syntax syntax-e syntax->datum syntax->list syntax-property #' (void) quote-syntax datum->syntax syntax-parameter-value syntax-rule raise-syntax-error internal-definition-context? syntax-parameterize make-set!-transformer prop:set!-transformer free-identifier=? syntax-local-value/immediate syntax-local-transforming-module-provides? syntax-local-module-defined-identifiers syntax-local-module-required-identifiers make-require-transformer (require (lib "stxparam.ss" "mzlib")) syntax? (require mzlib/defmacro) define-macro syntax-local-lift-expression (require racket/stxparam-exptime) make-rename-transformer syntax-local-require-certifier make-parameter-rename-transformer syntax-local-value define-syntax-parameter make-provide-transformer syntax-local-provide-certifier syntax-source local-expand/capture-lifts local-transformer-expand/capture-lifts syntax-local-lift-values-expression syntax-local-lift-module-end-declaration syntax-local-lift-require syntax-local-lift-provide syntax-local-name syntax-local-context syntax-local-phase-level syntax-local-module-exports syntax-local-get-shadower syntax-local-certifier syntax-transforming? syntax-local-introduce make-syntax-introducer exn:fail:syntax make-syntax-delta-introducer syntax-local-make-delta-introducer syntax-case define-syntax syntax-rules with-syntax syntax-position syntax-line syntax-column ...

Name: Anonymous 2011-07-28 22:11

>>29
I don't use the Racket racket.

Name: Anonymous 2011-07-29 5:04

>>28
C is not simple nor pure.

>>29
Go back to Symta, jew.

Name: Anonymous 2011-07-29 6:19

Bonerlang

Name: Anonymous 2011-07-29 9:41

Most large scale bio-informatics work is done in C++ as the primary work-horse language, plus maybe a higher-level language like Java for managing your distributed computing infrastructure (ie. sending work-units out to various nodes on your supercomputer cluster, and collating them once they've been processed, using something like with the Apache Hadoop framework). Also, GPGPU programming is becoming more and more popular, as newer supercomputers are usually built with them, so things like CUDA, OpenCL, DirectCompute and perhaps C++AMP in the near future.

Things like Lisp, Scheme, Haskell, etc. aren't used much outside of your typical undergraduate course.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List