Scientific insight demands crunching numbers. Managing and analyzing small and large datasets can be done with point-and-click software, but you will be limited. Coding your own computing scripts for managing and analyzing ecological data will set you apart from your colleagues and give you a technical skill that is transferable across sub-disciplines of ecology and beyond. This section provides suggestions and resources to help you learn and improve your programming skills.
ETIQUETTE IN COMPUTER PROGRAMMING
The point of maintaining etiquette when programming has most to do with writing understandable code. Ideally you will want others to use and build off the code you develop. You will also most likely be required to modify or utilize someone else’s code. You will always run into messy code, but as long as the developers are self-consistent in their coding practices, then understanding the code will be easier. Best is to maintain certain standards for writing code, some of which are outlined below.
- Naming conventions: us yr wds
- It is understandable to use variables or shortened names, but going back and forth between tens to thousands of lines of code to remember what fd means is maddening. Use descriptive names for uncommon variables, e.g. instead of fd, use flight_dist. There is no penalty for using longer names in terms of computer speed.
- When writing code for mathematical equations, use the standard variables or better yet, spell out what the variables describe, e.g. instead of p, use density.
- Foo and its variants are common example variables, but better clarity can be obtained by using ‘trash_XX’, ‘temp_XX’, or ‘test_XX’. An additional benefit is that you can search, replace or delete all instances of such temporary or debugging variables rather easily.
- For variable names, use the underscore and capitalization liberally. Typing the underscore will become muscle memory and it will help clarify variable names, e.g. primary_forest and secondary_forest or primaryForest and secondaryForest. Many examples in R utilize the dot ‘.’ for extending variable names, but the underscore has no other interpretation in computer programming and it is therefore a safe practice. Care should be taken with using capitalization (in Single or BLOCK forms) for distinguishing variables because not all computer languages are case-sensitive.
- You are writing hard code
- Even if you are writing code for a short-term project, develop the habit of writing flexible code, such that you may later easily change a parameter value and run the code again. Most generally, the fixed-value parameters in the code should be defined once at the top of the code, or in a separate ‘header file’
- Many times you will encounter situations in which multiple lines of code are repeated throughout your program. A simplifying procedure is to refactor multiple lines of code into a separate class or function, such that you replace multiple lines of code with a single line of code. The procedure creates variables that are passed into the function each time it is called. Many code editors contain functions in drop-down-like menus for refactoring code, and the automation is useful for learning how common blocks of code are compartmentalized and simplified.
- It’s not obvious, Annotate everything
- Add a comment above every line of code that is not hit-your-head obvious. Describe briefly what the next line of code does. Do this for each instance of the code, not just the first time it occurs as not everyone will need to read the code from the beginning – ever read through a publication with 20+ acronyms?
- Write descriptive comment blocks at the beginnings of important sections of code. Describe generally what the code block does, its inputs and outputs.
- Be generous with tabs and spaces when organizing your code and it will vastly improve the readability of your code.
- You can cut it if you can hack it
- Even expert programmers make liberal use of copy and paste to rework code from their own repositories or from others.
- Instead of going back to a particular project to view portions of code, e.g. to view how you coded a distance analysis in a spatio-temporal array, save unique code snippets in a repository to which you can easily search for important coding blocks. Start by saving important or new code blocks into separate files in a scripts folder on your computer. Better is to use a resource like GIT to store, share, and get feedback on improving your code.
- Versioning and Coding Communities (GIT, Evernote)
- Online Compilers for 50+ programming languages
- http://www.tutorialspoint.com/codingground.htm -- fantastic resource for testing small sections of code.
- Learn to Code
- a Primer on R
- http://www.statmethods.net/ -- for the beginner, this is gold
- http://www.ats.ucla.edu/stat/r/ -- more resources
- Data Visualization
- by far the best stand-alone resource for R code-snippet to the best of our knowledge
- http://www.r-bloggers.com/ -- use the search tool
- Geospatial Data in R
- You are still doing things in Arc? Try the package raster
- For spatio-temporal data analysis, use netcdf file format – extremely compact
- Learn the processing languages
- LC -- Either of a section of code, or a short description of an ecologist programmer and link to their code (GIT?) and website