HExSA Style Guide

This is an in-progress collection of hints for writing and typesetting in the HExSA research group. The source is on GitHub, so you can contribute or fork your own style guide there. This was originally forked from the excellent style guide from the Capra group at Cornell, which you can find the original code for here.

For more good advice, see Eddie Kohler’s LaTeX usage notes or Claire Le Goues’s things she repeats about writing.

#acro

acronyms

Acronyms and Initialisms should only include letters from the beginning of words. Syntactical Unconditional Parameterized Entity Reification can be turned into SUPER but catS consUming Puppies Exhibiting behavioR cannot.

Arguments can be made against this rule for the sake of symmetry or when mixing acronyms and abbreviations, as in our lab handle HExSA 😁

#bib

bibliography

The BibTeX code you get from the ACM Digital Library or IEEE Xplore or whatever (god forbid CiteCeer) is usually terrible. You really need to fix it up manually:

Remove all the useless fields like publisher, keywords, and abstract. For inproceedings (conference) entries, I prefer to keep author, title, booktitle, series, month, pages, and year. The goal here is to make it easy for readers to find the reference.
The key is to be consistent with references. Pick a style and stick with it. I always use something like: Proceedings of the $3^{rd}$ Workshop on Hot Topics in Operating Systems for booktitle in proceedings. I use the series bib key for the abbreivation, and I prefer SOSP '07 to SOSP 2007. Minor, but try to stick with the conference's own numbering system. For example, HotOS prefers Roman numerals, e.g. HotOS VIII.
If you've got a page limit that includes references: Edit the conference name (booktitle) to be less rambly. Remove stuff like Proceedings of the 32nd ACM SIGPLAN Conference on... and use something succinct like Programming Language Design and Implementation (PLDI). Include popular abbreviations in parentheses to help readers skim. When submitting to a venue with unjust page limit rules that include references, consider using the abbreviation by itself to save space (and as a form of protest).
Check for capitalization in the title and surround it with curly braces. For example, use {PRIMES} is in {P} to make sure BibTeX doesn’t render it as Primes is in p. This is particularly important for system names and acronyms.
Many papers have subsequently published journal versions. E.g., on the ACM DL, you might find a citation that includes the bibtex for both a TOCS journal paper and an ISCA paper. Always prefer the original conference paper, unless you're citing something special about the (usually extended) journal content.

If you find something on arXiv, always look for a real publication first before resorting to an arXiv citation. arXiv will usually list the original publication information in the notes field, assuming the authors uploaded it.

#citet

citation as nouns

Citation are not nouns. For example, this is wrong:

We build on the work of [32].

The right way is to either name the system or the authors:

We build on Terra [32].
We build on the work of Cookie Monster et al. [32].

The natbib package for LaTeX defines the \citet macro which automatically adds the names of the authors and the citation. It provides several other useful macros for citations.

#comma

commas

Use the serial comma: the last one in a list like locks, transactions, and atomics.

#contraction

contractions

Don’t use ’em in papers. They’re fine in blog posts ’n’ such.

#dups

duplicates

When you write TeX in e.g., vim, there is a notorious tendency to have duplicate words appear in the final paper because of the word alignment in your text editor. This usually happens as a result of feverish editing, but manifests itself as having a word at the end of the line with a duplicate appearing at the beginning of the next line (in the TeX). Of course they might not be so hidden in the typeset document, so you might have a very obvious "the the" in your paper.

If you're using a paper repo I built, you'll see a file called dups which is a Perl script to detect such duplicate words, again stolen from Matt Might.

#egie

e.g. and i.e.

The abbreviation e.g. means for example, and i.e. means something like in other words. Do not italicize them, and always follow them with a comma. Both need some kind of separating punctuation preceding them, such as a comma, an opening parenthesis, or an em dash. For example:

They forgot the most important thing in the world, i.e., breakfast. There would be no egg-based foods (e.g., omelettes or quiches) today.

#emdash

em dash

In TeX, three hyphens (---) become an em dash (—).

You can use em dashes, sparingly, in place of parentheses or to evoke a conversational pause. See Eddie Kohler’s advice on how to use them. Do not put spaces around an em dash.

#endash

en dash

In TeX, two hyphens (--) become an en dash (–).

Use an en dash, not a hyphen, in numeric ranges like 4–10.

Also use an en dash between words that are balanced with each other but not part of a compound word. Common examples include hardware–software, producer–consumer, and things named after multiple people, like Curry–Howard or Lucas–Kanade.

#etal

et al.

et al. is an abbreviation of the Latin et alii meaning and others. It is typically used when citing others' work. Do not use et. al. or et. al.

#figs

figures

Good figures can make or break your paper. The key is that they should make it easy and quick for the reader to interpret your data. Here are some guidelines that I find myself repeating to students:

Don't hide the data. When possible, show full distributions rather than summary statistics. If you use summary statistics, make sure they're applicable. For example, don't use the arithmetic mean if your data is not normally distributed.
Simplify figures as much as possible. I refer to Tufte here.
Learn a plotting package (matplotlib, plotly, gnuplot) early on and stick with it. Script your plot generation. You will thank me later.
Show error bars or confidence intervals when possible. Be clear about which one you're using. For example, explicitly state in your text if you're using 95% confidence intervals.
Make sure bars/lines/points are distinguishable. Colors are better than nothing, but patterns are more reliable in bars. Use differnt point styles for scatter/line plots. Use line styles for line plots too (e.g., dashed, dotted, etc.).
When using colors for anything meaningful, print them out in black and white to make sure they can be distinguished. Many reviewers (and readers) print papers this way to review. Also take into consideration that some of your readers may be colorblind.
Font sizes in figure text is almost always too small. Try to make it larger in your plotting package, and double-check how it looks in your PDF after typesetting. Assume your reader will get angry if they have to squint to read your figures.
If you're using a log-plot, or a false origin make sure readers know.

#footnote

footnotes

Footnotes at the end of the sentence go after the period. In TeX:

Some technical statement.\footnote{Snide remark here.}

#however

however

However and similar words are not conjunctions, so foo, however bar is a comma splice. Try instead:

foo. However, bar

Or:

foo; however, bar

#hyphen

hyphens

Use hyphens in compound modifier phrases when they help clarify which words go together. Specifically, hyphenate compound modifiers that come before the noun they modify, except when they consist of an adverb ending in -ly. Do not hyphenate compound modifiers that come after the noun they modify unless you have a really good reason to.

For example, you need a hyphen in language-based security, off-chip DRAM, and real-time deadline but not in this accelerator is fixed function. On the other hand, fully connected layer does not need a hyphen, even though the modified noun layer comes last, because the -ly suffix in fully makes it easy to see how to parse the phrase.

Some phrases can act either as modifiers that need hyphens or as nouns that do not. The phrase state of the art is a common bugaboo. A reference to something in the state of the art does not need hyphens, but a state-of-the-art accelerator does.

#inline

inline math & code

You often want to put math, code, or other notation in the flow of prose. Do it like this:

Introductory sentence, ending with a colon:
%
\[ e = m \times c^2 \]
%
More explanation here.

The text leading up to the notation should give enough context so that the reader knows why they are about to see an equation. It should call out the key insight they should look for while trying to understand the math or listing. The text afterward should provide justification and explain details that make sense after seeing the notation.

Above and below the math or listing, use an empty TeX comment line (%) to avoid starting a new paragraph while still making the TeX look readable.

For math, be sure to use display-mode math macros like \[ x \] or align*. Use align* (instead of several \[ \] equations in a row) when you have multiple lines:

\begin{align*}
S &= \frac{T_s}{T_p} \\
  &= \frac{1}{(1 - p) + \frac{p}{s}}
\end{align*}

#judgment

judgment

Use the predominant American spelling, judgment.

#list

lists

If you have to list items that are only a sentence or two, consider inlining them into a paragraph, following these rules. (1) There’s no need for fancy marker words like firstly and secondly. (2) Instead, use numerals in parentheses at the beginning of each item.

#math

math

In LaTeX math:

Don't use * for multiplication. Use \cdot, \times, or nothing at all.
For variable names that are words, use $\text{count}$ . A plain $count$ looks like you're multiplying five single-letter variables together.

#passive

passive voice

Avoid the passive voice as much as reasonable. When you say the data is converted, for example, your writing will be clearer if it is more specific about who or what does the conversion: the system, the algorithm, the user, the server, the authors, etc.

An imperfect way to tell whether a sentence is in the passive voice is to try adding …by space aliens to the end. If that works, you probably want to add a subject for your verb. (Credit to Melissa O’Neill for this trick.)

Sometimes, rewriting a sentence to avoid the passive voice makes it worse. Give it an earnest try, but give up if the alternative seems bad.

If I've set up a paper repo for you, you've probably seen a file named passive. This is a perl script I stole from Matt Might long ago that detects passive voice in your TeX and alerts you to it if you want to go hunting. I'd avoid doing that one when you've only got hours to go before a submission though.

#quote

quotes

Periods and commas at the ends of quotations go inside the quotation marks. For example:

My advisor texted me that I had “fixed the last bug,” but I replied that “bugs are forever.”

In TeX, remember to use `` for opening and '' for closing double quotes. TeX will typeset "this" with opening quotes on both sides.

#ref

references

To reference a section, use Section~\ref{...}. Capitalize Section and use a tilde to make a nonbreaking space. (And the same for tables and figures.) Always use Section, even when it’s a subsection or subsubsection.

When explaining figures or tables that show results, refer to them early in the paragraph, usually in the first sentence, and then explain the contents in more detail. For example:

Figure 9 shows the execution time for each benchmark relative to an unmodified execution. The geometric mean slowdown is 8%. The worst slowdown is streamcluster, which is 31% slower with debugging enabled.

It’s usually best to put the figure or table reference right at the beginning of the sentence and to follow it with an active verb. For example, prefer Figure N shows X to X can be found in Figure N or As shown in Figure N, X.

#runtime

runtime, run-time, run time

Use run time as a noun for the time when execution happens, as in the error arises at run time. Use run-time as an adjective phrase before the noun, as in the system’s run-time behavior. Use runtime as the noun that is shorthand for runtime system, as in we designed a compiler and runtime. (Credit to James Wilcox’s recollection of Mike Ernst.)

Use the same rules for compile time and compile-time (but compiletime is not a thing).

#sectitle

section titles

For most section titles, use Title Case. For “run-in” titles like LaTeX's \paragraph, use sentence case and end the title with a period.

#simplify

simpler phrases

Writing in “academic mode” can tempt you to use phrases that are more complicated than they need to be. Try to keep things simple, even if it means sounding informal. Here are some find/replace patterns for simplifying language:

which means that → so
gives X the ability to → lets X
allows X to → lets X
is different → differs
is built on → builds on
it is observed that X → just X or, if necessary, we observe that X
in order to → to
as can be seen in the figure, X → the figure shows that X or just X
has the potential to → could
a sufficient amount of → enough
utilize → use
make use of → use
note that X → X
it is worth noting that X → X

#subfig

subfigures

Use the subcaption package to lay out subfigures in TeX. Use something like this:

\begin{figure}
    \centering
    \begin{subfigure}[b]{0.5\linewidth}
        \centering
        Figure here.
        \caption{First caption.}
        \label{fig:thing1}
    \end{subfigure}
    \begin{subfigure}[b]{0.5\linewidth}
        \centering
        Another figure here.
        \caption{Second caption.}
        \label{fig:thing2}
    \end{subfigure}
    \caption{Caption for both.}
    \label{fig:both}
\end{figure}

Using 0.5\linewidth for the size of the subfigures makes them divide the horizontal space equally.

#subjective

subjective words

Avoid subjective language as much as possible. The phrase we observe huge speedups does not belong in a technical paper. We demonstrate a 20× speedup over the baseline is better. A corollary of this is the use of Weasel Words, which are well explained by Matt Might.

If you've used a paper repo with me, you will see a file called weasel which detects such words in your TeX.

#table

tables

In TeX, use booktabs and its \toprule, \midrule, and \bottomrule lines instead of \hline. Do not put horizontal lines between every row (but between the header row and the rest of the rows is fine). Do not use vertical lines. Right-align columns that contain numbers.

#they

personal pronouns

To refer to a notional person like the programmer, use the singular they.

#this

this as a subject

It can be tempting to chain together two sentences by starting the second one with this as a noun. For example:

Because the oven was broken, we microwaved the hot pocket. This resulted in a quicker but soggier treat.

This pattern can make the second sentence harder to read because the referent can be ambiguous (at first glance, this might be the hot pocket, the oven, or the microwave). Insert a clarifying noun, such as this technique in the sentence above.

#times

times

Use a × symbol, not the letter x, when writing about dimensions (“a 4×2 grid”) or factors (“a speedup of 2.3× over the baseline”). In TeX, you can use $\times$ to get the symbol.

#tradeoff

trade-off

Use trade-off, not tradeoff. (Both are common, but we had to pick one and the former seems to be more popular.)

#transpiler

transpiler

Avoid transpiler. Just use compiler instead. (See Lindsey Kuper’s investigation into the varied, fuzzy meaning of the former.)