This is an in-progress collection of hints for writing and typesetting in the HExSA research group.
The source is on GitHub, so you can contribute or fork your own style guide there. This was originally
forked from the excellent style guide from the Capra group at Cornell, which you can find the original
code for here.
Acronyms and Initialisms should only include letters from the beginning of words.
Syntactical Unconditional Parameterized Entity Reification can
be turned into SUPER but catS consUming Puppies Exhibiting behavioR
cannot.
Arguments can be made against this rule for the sake of symmetry or
when mixing acronyms and abbreviations,
as in our lab handle HExSA 😁
The BibTeX code you get from the ACM Digital Library or IEEE Xplore or whatever (god forbid CiteCeer) is usually terrible.
You really need to fix it up manually:
Remove all the useless fields like publisher, keywords, and abstract. For inproceedings (conference) entries, I prefer to keep author, title, booktitle, series, month, pages, and year. The goal here
is to make it easy for readers to find the reference.
The key is to be consistent with references. Pick a style and stick with it. I always use something like: Proceedings of the $3^{rd}$ Workshop on Hot Topics in Operating Systems for booktitle in proceedings. I use the series bib key for the abbreivation, and I prefer SOSP '07 to SOSP 2007. Minor, but try to stick with the conference's own numbering system. For example, HotOS prefers Roman numerals, e.g. HotOS VIII.
If you've got a page limit that includes references: Edit the conference name (booktitle) to be less rambly. Remove stuff like Proceedings of the 32nd ACM SIGPLAN Conference on... and use something succinct like Programming Language Design and Implementation (PLDI). Include popular abbreviations in parentheses to help readers skim. When submitting to a venue with unjust page limit rules that include references, consider using the abbreviation by itself to save space (and as a form of protest).
Check for capitalization in the title and surround it with curly braces. For example, use {PRIMES} is in {P} to make sure BibTeX doesn’t render it as Primes is in p. This is particularly important for system names and acronyms.
Many papers have subsequently published journal versions. E.g., on the ACM DL, you might find a citation that includes the bibtex for both a TOCS journal paper and an ISCA paper. Always prefer the original conference paper, unless you're citing something special about the (usually extended) journal content.
If you find something on arXiv, always look for a real publication first before resorting to an arXiv citation. arXiv will usually list
the original publication information in the notes field, assuming the authors uploaded it.
Citation are not nouns. For example, this is wrong:
We build on the work of [32].
The right way is to either name the system or the authors:
We build on Terra [32].
We build on the work of Cookie Monster et al. [32].
The natbib package for LaTeX defines the \citet macro which automatically
adds the names of the authors and the citation.
It provides several other useful macros for citations.
When you write TeX in e.g., vim, there is a notorious tendency
to have duplicate words appear in the final paper because of the word alignment in your
text editor. This usually happens as a result of feverish editing, but manifests itself
as having a word at the end of the line with a duplicate appearing at the beginning
of the next line (in the TeX). Of course they might not be so hidden in the
typeset document, so you might have a very obvious "the the" in your paper.
If you're using a paper repo I built, you'll see a file called dups which
is a Perl script to detect such duplicate words, again stolen from Matt Might.
The abbreviation e.g. means for example, and i.e. means something like in other words.
Do not italicize them, and always follow them with a comma.
Both need some kind of separating punctuation preceding them, such as a comma, an opening parenthesis, or an em dash.
For example:
They forgot the most important thing in the world, i.e., breakfast.
There would be no egg-based foods (e.g., omelettes or quiches) today.
In TeX, three hyphens (---) become an em dash (—).
You can use em dashes, sparingly, in place of parentheses or to evoke a conversational pause.
See Eddie Kohler’s advice on how to use them.
Do not put spaces around an em dash.
Use an en dash, not a hyphen, in numeric ranges like 4–10.
Also use an en dash between words that are balanced with each other but not part of a compound word.
Common examples include hardware–software, producer–consumer, and things named after multiple people, like Curry–Howard or Lucas–Kanade.
Good figures can make or break your paper. The key is that they should
make it easy and quick for the reader to interpret your data. Here are some guidelines that
I find myself repeating to students:
Don't hide the data. When possible, show full distributions rather than summary statistics. If you use
summary statistics, make sure they're applicable. For example, don't use the arithmetic mean if your data
is not normally distributed.
Simplify figures as much as possible. I refer to Tufte here.
Learn a plotting package (matplotlib, plotly, gnuplot) early on and stick with it. Script your
plot generation. You will thank me later.
Show error bars or confidence intervals when possible. Be clear about which one you're using. For example,
explicitly state in your text if you're using 95% confidence intervals.
Make sure bars/lines/points are distinguishable. Colors are better than nothing, but patterns are
more reliable in bars. Use differnt point styles for scatter/line plots. Use line styles for line plots too (e.g., dashed, dotted, etc.).
When using colors for anything meaningful, print them out in black and white to make sure they
can be distinguished. Many reviewers (and readers) print papers this way to review. Also
take into consideration that some of your readers may be colorblind.
Font sizes in figure text is almost always too small. Try to make it larger in your plotting package, and double-check how it looks in your PDF after typesetting. Assume your reader will get angry if they have to squint
to read your figures.
If you're using a log-plot, or a false origin make sure readers know.
Use hyphens in compound modifier phrases when they help clarify which words go together.
Specifically, hyphenate compound modifiers that come before the noun they modify, except when they consist of an adverb ending in -ly.
Do not hyphenate compound modifiers that come after the noun they modify unless you have a really good reason to.
For example, you need a hyphen in language-based security, off-chip DRAM, and real-time deadline
but not in this accelerator is fixed function.
On the other hand, fully connected layer does not need a hyphen, even though the modified noun layer comes last, because the -ly suffix in fully makes it easy to see how to parse the phrase.
Some phrases can act either as modifiers that need hyphens or as nouns that do not.
The phrase state of the art is a common bugaboo.
A reference to something in the state of the art does not need hyphens, but a state-of-the-art accelerator does.
You often want to put math, code, or other notation in the flow of prose.
Do it like this:
Introductory sentence, ending with a colon:
%
\[ e = m \times c^2 \]
%
More explanation here.
The text leading up to the notation should give enough context so that the reader knows why they are about to see an equation.
It should call out the key insight they should look for while trying to understand the math or listing.
The text afterward should provide justification and explain details that make sense after seeing the notation.
Above and below the math or listing, use an empty TeX comment line (%) to avoid starting a new paragraph while still making the TeX look readable.
For math, be sure to use display-mode math macros like \[ x \] or align*.
Use align* (instead of several \[ \] equations in a row) when you have multiple lines:
If you have to list items that are only a sentence or two, consider inlining them into a paragraph, following these rules. (1) There’s no need for fancy marker words like firstly and secondly. (2) Instead, use numerals in parentheses at the beginning of each item.
Avoid the passive voice as much as reasonable.
When you say the data is converted, for example, your writing will be clearer if it is more specific about who or what does the conversion: the system, the algorithm, the user, the server, the authors, etc.
An imperfect way to tell whether a sentence is in the passive voice is to try adding …by space aliens to the end.
If that works, you probably want to add a subject for your verb.
(Credit to Melissa O’Neill for this trick.)
Sometimes, rewriting a sentence to avoid the passive voice makes it worse.
Give it an earnest try, but give up if the alternative seems bad.
If I've set up a paper repo for you, you've probably seen a file named passive. This
is a perl script I stole from Matt Might long ago that detects passive voice in your
TeX and alerts you to it if you want to go hunting. I'd avoid doing that one when you've
only got hours to go before a submission though.
To reference a section, use Section~\ref{...}. Capitalize Section and use
a tilde to make a nonbreaking space. (And the same for tables and figures.) Always use Section, even when it’s a subsection or subsubsection.
When explaining figures or tables that show results, refer to them early in the paragraph, usually in the first sentence, and then explain the contents in more detail. For example:
Figure 9 shows the execution time for each benchmark relative to an unmodified execution. The geometric mean slowdown is 8%. The worst slowdown is streamcluster, which is 31% slower with debugging enabled.
It’s usually best to put the figure or table reference right at the beginning of the sentence and to follow it with an active verb.
For example, prefer Figure N shows X to X can be found in Figure N or As shown in Figure N, X.
Use run time as a noun for the time when execution happens, as in the error arises at run time.
Use run-time as an adjective phrase before the noun, as in the system’s run-time behavior.
Use runtime as the noun that is shorthand for runtime system, as in we designed a compiler and runtime.
(Credit to James Wilcox’s recollection of Mike Ernst.)
Use the same rules for compile time and compile-time (but compiletime is not a thing).
Writing in “academic mode” can tempt you to use phrases that are more complicated than they need to be.
Try to keep things simple, even if it means sounding informal.
Here are some find/replace patterns for simplifying language:
which means that → so
gives X the ability to → lets X
allows X to → lets X
is different → differs
is built on → builds on
it is observed that X → just X or, if necessary, we observe that X
in order to → to
as can be seen in the figure, X → the figure shows that X or just X
Avoid subjective language as much as possible. The phrase we observe
huge speedups does not belong in a technical paper. We demonstrate
a 20× speedup over the baseline is better. A corollary of this
is the use of Weasel Words, which are well explained by Matt Might.
If you've used a paper repo with me, you will see a file called weasel which
detects such words in your TeX.
In TeX, use booktabs and its \toprule, \midrule, and \bottomrule lines instead of \hline.
Do not put horizontal lines between every row (but between the header row and the rest of the rows is fine).
Do not use vertical lines.
Right-align columns that contain numbers.
It can be tempting to chain together two sentences by starting the second one with this as a noun.
For example:
Because the oven was broken, we microwaved the hot pocket. This resulted in a quicker but soggier treat.
This pattern can make the second sentence harder to read because the referent can be ambiguous (at first glance, this might be the hot pocket, the oven, or the microwave).
Insert a clarifying noun, such as this technique in the sentence above.
Use a × symbol, not the letter x, when writing about dimensions (“a 4×2 grid”) or factors (“a speedup of 2.3× over the baseline”).
In TeX, you can use $\times$ to get the symbol.