\hbar=c=1

ExpandingStyle

motivation and organization

I am not particularly fond of coding style guides in general. However, as Julia matures common practices are starting to materialize, many of which I applaud, but some of which I detest. I therefore felt compelled to create this guide as a rebuttal to some of the style practices of which I disapprove.

Some of this guide specifically addresses what is currently the most commonly cited Julia style guide BlueStyle. Despite my emphasis to the contrary, most of this style guide agrees with BlueStyle, and I am partially grateful that it has become so commonly followed, as Julia is expressive enough to allow for some much worse alternatives. However, I have some major gripes with BlueStyle, in particular some quibbles about naming, the infuriating way they write multi-line function arguments, and especially their gratuitous insistence on return statements nearly everywhere.

In this guide I am most careful to record major points of departure between my style and some of the most common practices. Therefore, if I don't mention it here, I probably agree with BlueStyle, and almost certainly agree with the Julia language docs style guide.

As I've previously stated, I'm a bit dubious of the entire concept of style guides and as such, I frown upon dogmatic adherence to any style guide, including this one.

Readability, though subjective, is the most important thing.

Typing is fast and fun, even the best coders will spend far more time reading code than writing it.

summary

The Julia style guide appearing in the base language documentation is widely adhered to and not particularly controversial. Indeed, I follow it here, there are few, if any, points of disagreement between what I present here and this base style.

Many functions with mostly just a few lines per function.
4 spaces per indentation level, no tabs.
Roughly 100 character line length limit. Shorter is ok, longer quickly starts getting bad.
Upper camel-case for modules and types.
All lowercase without underscores for functions by default. Underscores are encouraged in cases of overly verbose symbols, but these should be avoided.
Terse, succinct variable names. Single characters are ideal. Unicode is encouraged, particularly for idiomatic names.
using should group packages semantically, e.g. stdlibs together. using statements which import module contents should get their own line.
Method doc strings are strongly encouraged. Incomplete is better than missing.
Some whitespace for readability is a good thing, but don't go overboard or it becomes counter-productive.
No padding brackets with spaces.

code organization

The best code is both modular and generic. The former means that you can either take small pieces out of your code or put other things into it and still get something useful, the latter means the code has many applications.

Code which is modular and generic tends to look a certain way. For one, it cannot be empahsized enough that you should write functions, not just scripts Functions can be taken out and used in different places for different purposes. Scripts are mostly useless except for the purpose and context for which they are written.

Usually making your code modular means that the vast majority of functions are just a few lines. If you are writing much longer functions, it is likely they can be split up into smaller components many of which may have broader applicability. This is also related to the "don't repeat yourself" principle since usually very large blocks of code contain some sort of repetition. You will likely find that code written this way is much easier to understand.

#GOOD
short1() = # small amount of code
short2() = # small amount of code
iteration() = # some stuff involving the above
some_preparation()
function long()
    some_preparation()
    A = Matrix{ComplexF64}(undef, n, m)
    foreach(iteration, CartesianIndices(A))
    A
end


#BAD
function long()
    # lots of preparation in-line
    A = Matrix{ComplexF64}(undef, n, m)
    for idx ∈ CartesianIndices(A)
        # tons of complicated code in-line
    end
    A
end

Keeping code modular is also extremely helpful for writing good unit tests.

naming

types and modules

Types and modules should use upper camel case. In some cases, it might be preferable to lower following characters as this can cut down on extensive visual noise (e.g. feel free to make some of the non-initial characters in acronyms lowercase).

Again, this is consistent with both the base Julia style guide and BlueStyle, I will defer more detail to those.

#GOOD
module FiberBundles end

#BAD
module fiberbundles end

#GOOD
struct ManifoldPoint end

#BAD
struct Manifold_Point end

parameters

In cases where there is likely to be confusion between type parameters which are types and those which are literals, type parameters should use a script case, for example 𝒯 to distinguish them as such. It is best to do this only for parameters which are actually types, for exmaple in Array{N,𝒯}, N is intended to be an Integer. Script characters are \mathcal in latex and prepended with \scr in LaTeX-to-unicode.

#GOOD (X,Y,Z,A are not types)
struct ManifoldPoint{𝒯<:Real,X,Y,Z,A} end

In more common cases in which there is no risk of confusion it is fine to use uppercase letters.

# GOOD
struct ManifoldPoint{T} end

# BAD (they look like literals)
struct ManifoldPoint{type} end
struct ManifoldPoint{TYPE} end

idiomatic type names

In some cases a short or special type name is idiomatic. In all cases, such special names should be some sort of special character. For example, it is reasonable to define

#GOOD
const ℝ = Real
const ℂ = Complex

#BAD
const R = Real

functions

Function names should be all lowercase and run together without underscores. Particularly long or verbose names should have underscores for clarity, but the main reason for discouraging underscores is because short function names encourage good multiple dispatch code.

For example, it is very unlikely that you want two separate functions rand_int() and rand_float(), much more likely you want rand(𝒯) where the argument specifies the method behavior and return type. Looking for short function names can encourage you to consider what methods it should have, and to combine similar functionality into functions that can be used generically.

Particularly short, one or two character function names are not often appropriate, but may be in some idiomatic cases, for example, if you are using the gamma function, by all means please call it Γ, it would be crazy not to.

Yes, it is possible to go too far. The main point of combining methods into functions is to facilitate generic functionality and make it easier to write generic code. This is different from coding "puns" in which a symbol is overloaded with methods that cannot possibly be used in analogous situations.

#GOOD
function run(s::Server)
end
function Γ(z::Number)
end

#BAD
function run_server(s::Server)
end
function gamma_function(z::Number)
end

internal functions

It is often the case the particularly verbose names are appropriate for "internal" functions with very limited applicability. As Shannon taught us, efficient encodings use longer symbols for less common values. If it is intended that a function is not expected to be used far from its initial definition, it should be prepended with a _. For example, an internal function might look something like _handle_bizarre_edge_case(a...).

variables

Variable names should be short and terse, preferably one character. Naming conventions should be idiomatic where possible. This is likely to be one of the more controversial aspects of this style guide and I am importing it from math and physics.

Readability is of the utmost importance. Verbose variable names reduce the signal to noise ratio and, in the worst cases, make reading complicated expressions horrendous.

Certain conventions about what variables are typically used as what should be respected unless the context demands otherwise. For example, n, m, i, j, k are probably integers. x, y, z are probably numbers in a continuous space, either real or complex. θ, ϕ, φ are likely continuous and dimensionless. κ is likely "constant" in some respect. v, w are likely not scalar. f, g may be functions, but it is often better to use script characters such as 𝒻, ℊ to specify functions to show the distinction. A, B are likely operators or matrices.

The above should not be followed dogmatically, but is flexible depending on the context.

It is usually preferable to accompany a function with at least a minimal doc string, which in many cases will explain at least argument variables. Comments explaining variables are acceptable if you feel they are needed.

Some might object to this saying that expecting comments and doc strings to do the job of "self-documenting" verbose variable names is crazy. In my experience, this argument is based on a fantasy: either the use of the variables is mostly clear even without the verbose names or additional documentation is needed anyway. There is only a very narrow space in between and it isn't worth having unwieldy and verbose naming conventions in the hope that some day you're going to be lucky enough to land there.

#GOOD
n = 0
k += 1
z = x + im*y
v[i]
v[idx]  # in some cases, such as if idx isa CartesianIndex
cfg = configure()
χ = connect()
B = A*x  # x is a vector in this case
α, β = divrem(N, m)

#BAD
number = 0
count += 1
value = real_part + im*imag_part
vector[index]
config = configure()
connection = connect()
lhs_matrix = coefficient_mat*vec

Some accommodation for fonts which may have poor unicode support is reasonable, for example, it's reasonable to hesitate to use 𝔊. It is also reasonable to avoid ν because some fonts display it as a v. However, excessive subservience to crap fonts should be avoided. If you are using a font that's so bad that you are afraid to use non-ASCII, change it.

caveats

Using terse variable names requires a little bit of conscientiousness on the part of the programmer. For example, in a function (or module) in which "counting" integers are extremely common, one might struggle to assign appropriate names for all of them.

For example

#GOOD
f(i, j, n, k) = do_stuff_1()
f(i, j, name::AbstractString, idx) = do_stuff_2()

#BAD
f(i, j, n, k) = do_stuff_1()
# in the good example we saw that the 3rd and 4th args have different semantics here,
# so it's not good to keep the names (especially k)
f(i, j, n::AbstractString, k) = do_stuff_2()

In other words, yes, it's possible to go too far using terse variable names. This is not a good justification for extremely verbose variable names, but it is a good justification for not fanatically adhering to a specific convention. If you are struggling to come up with good names, just use a more verbose one, it doesn't mean you should make everything verbose. (See above comments about dogmatic adherence to style guides being bad.)

globals

Avoiding globals is good practice in most languages for a number of reasons, but they are particularly bad in Julia unless you know what you're doing. All globals should be const, except in very small scripts. Global names should be all uppercase, this serves to strongly distinguish them from locals, and to a lesser extent to warn you to be careful with them.

#GOOD (sometimes at least)
const GLOBAL_MUTABLE_STATE = Dict{Int,Float64}()  # note this is fully-typed

#BAD, very bad
global_mutable_sate = Dict()

operators

Use unicode operator names where appropriate, particularly the built-ins.

#GOOD
A ⊆ B
a ≠ b
α ≡ β
x ∈ S

#BAD
issubset(A, B)
a != b
α === β
x in S

method definitions

Please, for the love of god, stop using unnecessary return statements.

An elegant and delightful convention which Julia has inhereted from Lisp is that everything is an expression. To use return statements gratuitously is to deny the beauty and simplicity of this concept. It seems likely to me that many people want unnecessary return statements because they are not sufficiently comfortable with this concept. It is a good thing to get used to, as it provides many nice and expressive ways of writing code

#GOOD
function 𝒻(x, y)
    #(this should be in one line, but I expand it for illustrative purposes)
    if y ≥ 0
        x
    else
        -x
    end
end
function 𝒽(x, y)
    z = if rand() > 1/2
        x + y
    else
        x - y
    end
    cosh(z)
end

#BAD
function 𝒻(x, y)
    if y ≥ 0
        return x
    else
        return -x
    end
end
function 𝒽(x, y)
    if rand() > 1/2
        z = x + y
    else
        z = x - y
    end
    return cosh(z)
end

It has been argued that choosing not to include return statements makes it less clear what the intention of the function is. This is of course absurd, and likely imported from non-Lisp languages, since nothing is forcing you to have a return value, if you really don't want one, return nothing, that's why it exists. Don't return something you don't intend to return.

#GOOD
function ℊ(A)
    # do some stuff to A
    # in most cases functions like this return A, but perhaps not
    nothing
end

#BAD
function ℊ(A)
    # do some stuff to A
    return nothing
end

Obviously, if you want to return a value before the end of the function block, you should use the return statement.

arguments

Arguments should usually have type assertions. It is sometimes falsely claimed that Julia relies on this for performance. This is not true, but type assertions on arguments are still useful because

They catch many errors and usually result in a much more comprehensible error message than if they were absent. It's also much easier to unintentionally allow for undefined behavior if omitting them.
It can make use case of methods more clear.
It is useful for multiple dispatch. Even if you do not initially intend to define other methods for your function, using reasonable type assertions early on can save you from a lot of confusion later.

You should avoid overly specific types assertions. Type assertions which are too specific inappropriately limit the functionality of a method. The following might be educational:

Most of the time, if you want an integer, you want Integer. In most contexts, any integer makes sense. Sometimes Signed or Unsigned is appropriate.
Most of the time you want Real and not Float64. A notable exception is GPU's, which typically require Float32 to work efficiently.
Most of the time you want AbstractArray rather than array. In many cases this will be AbstractVector or AbstractMatrix.
Avoid overly specific types for container type parameters. You probably want AbstractVector{<:Real}, not Vector{Float64}.

#GOOD
function 𝒻(z::Number, x::Real, n::Integer, v::AbstractVector{<:Complex})
end

#BAD (again, usually)
function 𝒻(z, x, n::Int, v::Vector{ComplexF64})
end

It's important when to know to "give up" on type assertions. Again, if they are too specific, your code won't be as useful as it could be. Unless you are new to the language, if you find yourself struggling to figure out an appropriate type assertion, it's time to give up and just leave it off. Usually you don't want to bother with assertions with Union, though an important exception is for handling "null" values such as Union{Nothing,Int}.

Another notable class of cases in which you should leave off type assertions is iterators. Iterators can be of any type and Julia has no formal way of specifying argument interfaces. If, e.g. an AbstractArray or Tuple is appropriate, you probably shouldn't bother with a type assertion.

code formatting

You should adhere to the following

Use one-line function if it fits!
The first arguments should always appear on the same line as the function name.
Use more lines for clarity if needed, especially for keyword arguments.
The closing bracket should either be on the same line as the function name, or in the same column as the opening bracket, never at the 1st column.
Start keyword arguments on a new line unless your signature is very short... trust me, I'm very bad with this and it has often cost me.
No spaces between keywords, = and their arguments.

#GOOD
𝒻(x::Number, y::Number) = x + y
function ℊ(z::𝒯;
           option1::AbstractString="some_option",
           option2::AbstractString="another_option",
           switch1::Bool=true, switch2::Bool=false,
          ) where {𝒯}
end

#BAD
function 𝒻(x, y)
    x + y
end
function ℊ(
    z::𝒯; option1 = "some_option", option2 = "another_option",
    switch1::Bool = true, switch2::Bool = false,
) where {𝒯}
end

lists

Lists of items, whether arguments or an array definition, can be unrolled into multiple lines for clarity when appropriate. This is semantically meaningful in the case of arrays.

If a list is any longer than a single line always put commas (or whatever delimiter is appropraite) after every element, including the last one. The reason for this should be obvious: if you add elements to or otherwise alter the list, you will cause a syntax error unless you remember to add the comma.

Excessive verbosity is discouraged. Feel free to make list elements more compact than similar syntax might appear in other situatons. For example, pairs => should not be padded with spaces.

#GOOD
const LOOKUP_TABLE = Dict("a"=>0x01, "b"=>0x02,
                          "c"=>0x03, "d"=>0x04,
                         ) # closing on the previous line is also ok

#GOOD
const LOOKUP_TABLE = Dict("a"=>0x01,
                          "b"=>0x02,
                          "c"=>0x03,
                          "d"=>0x04,
                         )

#BAD
const LOOKUP_TABLE = Dict("a" => 0x01,
                          "b" => 0x02,
                          "c" => 0x03,
                          "d" => 0x04 #← missing comma!
)

control structures

for loops

One should always use ∈, not in and especially not = (the latter of which should, frankly, be removed from the language).

#GOOD
for x ∈ v
end

#BAD
for x in v
end

#WTF, how is this even a thing?
for x = v
end

by Expanding Man. Last modified: April 28, 2022. Website built with Franklin.jl and the Julia programming language.