Featured

On this page, I collect advice for academic economists that I came across and found useful. It is a growing collection of advice on writing and presenting your research, tipps regarding your CV as well as on the econ job market.

If you have anything of which you think it might be worth to add to this list, please send me an email or comment below

Line spacing in footnotes

When submitting papers, journals often require double-spaced footnotes. While there seem to be different approaches to it, this one worked quite well. It requires the footnotemisc package. You simply need to enter the following code in the preamble of your LaTeX file:

\usepackage{footmisc}
\renewcommand{\footnotelayout}{\setstretch{2}}

Check whether variable exists in if-conditions

In some applications, e.g. if you want to save coefficient estimates from a regression with many dummies (e.g. fixed effects), you might want to store coefficients as estimates. In this example, we are interested in storing the estimates of the GROUPVAR dummies, but not the dummies of OTHERVAR. While this is usually straightforward by writing

xi: reg OUTCOME i.GROUPVAR i.OTHERVAR, nocons noomit gen NEWVAR=_b[_IGROUPVAR_123]

where GROUPVAR is the original variable name for which the dummies where created in the regression (using xi:). Occasionally, it happens that some of the coefficients are not estimated due to multicollinearity (be sure to figure out why!), i.e. Stata reports

i.GROUPVAR _IGROUPVAR_111-999 (naturally coded; _IGROUPVAR_111 omitted)

If you are now interested to store all estimates of all dummies generated by “i.GROUPVAR“, we need to separate these variable names stored in the loop (starting with _IGROUPVAR_) from all other variable names stored in the loop.

While it is easy to check whether variables or scalars exist (using capture confirm variable VARNAME or capture confirm scalar SCALARNAME), it seems more difficult to do for large number of estimates after a regression, especially if some of the dummies are omitted. There are other ways to do it (e.g. using the _rmcoll command), but the following worked well for me. It first saves all stored estimates from the matrix of estimation results (stored in e(b)), and then does something with the estimates with certain names (in my case: those starting with _IGROUPVAR). In my case, this means that we are saving the estimates. The syntax goes as following:

* Run the regression (without constants)xi: reg OUTCOME i.GROUPVAR i.OTHERVAR, nocons noomit* Create local with all variable names used in the estimation (i.e. this does not include omitted variables)local regvars : colnames e(b)* Create variable to store estimatesgen VAR_FE=.* Run loop over all variables stored in local regvars'foreach locvar of local regvars {* Execute the command only for dummies generated from variable GROUPVARif substr("locvar'",1,10)=="_IGROUPVAR" { * Remove the prefix from the loop-local locvar' (e.g. keep "keep 123" instead of "_IGROUPVAR_123"):local GROUP_ID=subinstr("locvar'" ,"_IGROUPVAR" ,"",1)di _b["locvar'"] di "GROUP_ID'"replace VAR_FE=_b[locvar'] if GROUPVAR==GROUP_ID'}}

Now each observation (e.g. firm 123 stored in GROUPVAR) now has their respective fixed effect stored in variable VAR_FE.

Strikethrough text in LaTeX

For a very simple way of striking through text in LaTeX, you can use the soul package and then use \st{Strike through this text}

\usepackage{soul}...\st{Strike through this text}

Edit: there seem to be some complications when striking through references (which most of the time does not work for me)

Add time stamp to Stata figures

Especially in the early stages of a research project it might be good to “time stamp” figure so that you can later figure out when you created a certain figure. Of course you could just check the date in the explorer / finder, but you can also simply add a note to the figure with time and date of its creation:

For this purpose you can simply add Stata’s current time and date ("$S_TIME$S_DATE") to a note of a figure, such as

twoway (scatter y x), note("$S_TIME$S_DATE", span)

Span makes the date move to the very left.

Changing font color in table columns

In my ongoing search to further improve table layout for my beamer-presentations, I just came across a simple way to change the font color in tables. This comes in handy, for example, to highlight results regression results in specific tables.

Instead of, for example, specifying a 2-column table with two centered colums as

\begin{tabular}{cc}

add >{\color{red}} before the column definition that you would like to have printed in red (or whatever color you specified):

\begin{tabular}{c>{\color{red}}c}

If you would like to add this feature as an overlay, simply add the corresponding number, e.g. 4

\begin{tabular}{c>{\color<4>{red}}c}

So that it will only be shown on the fourth page of this slide

Necessary usepackages: \usepackage{xcolor}

Subfigures in LaTeX

To combine multiple figures in one figure environment in LaTeX, e.g. to show scatter plots for different groups, the usepackage subfig is there to help. It allows to create several subfigures while also adjusting the subtitles of each of the subfigures. Simple call the package in the preamble by adding

\usepackage{subfig}

For a 2×2 array of figures, define the figure environment in the following way:

\begin{figure}[!htbp]

\begin{center}

\subfloat[Panel A]{\includegraphics[height=22ex]{PanelA_figure.pdf}}

\subfloat[Panel B]{\includegraphics[height=25ex]{PanelB_figure.pdf}}\\

\subfloat[Panel C]{\includegraphics[height=22ex]{PanelC_figure.pdf}}

\subfloat[Panel D]{\includegraphics[height=25ex]{PanelD_figure.pdf}}

\end{center}

\end{figure}

Note that the double backslash \\ after Panel B forces LaTeX to insert a linebreak so that panels C and D are printed in a second line.

Create duplicate observations in Stata

For certain cleaning jobs, it can be useful to duplicate an observation (often only temporarily). To create a identical copy of an observation, just type

expand 2 if testvar=1, gen(dupindicator)

The 2 tells Stata that there should be two copies of the same observation (i.e. the original and the copy), which can be identified by the new variable dupindicator that is defined as 1 for the duplicate, and 0 for the original variables. With his new variable, the duplicates can later easily be identified and dropped if necessary.

Avoiding widows and orphans

To avoid paragraphs ending with a single line on the following page often looks a bit ugly. Especially when they are then followed by figures or tables. Single lines of a page-overlapping paragraph are called “widows” and “orphans”. How can we tell TeX to avoid those? With the following code, you tell TeX that it should put a penalty on these types of layouts. Setting this penalty very high (typically 10.000) avoids widows and orphans almost entirely. Just add the following code in the preamble of your document.

\clubpenalty = 10000\widowpenalty = 10000 \displaywidowpenalty = 10000

Returning percentiles as scalars

When I wanted to store percentiles in a local, e.g. to indicate the median in a figure, I used to first

sum var, detail
local var_p50=r(p50)'

The disadvantage of this approach is that you can only store pre-specified percentiles (i.e., 1, 5, 10, 25, 50, 75, 90, 95, 99), and it can be lengthy if you need to store several percentiles.

A bit of a shorter and nicer way is to use _pcentile. If you want to, for example, store the percentiles 5, 20, 80 and 95, you can write

_pctile ps1_pre, p(5 20 80 95)
which store the corresponding percentiles as locals r(r1)' to (r4)'


Recover Stata code from .gph files

If you ever found a good looking Stata figure saved as Stata’s own .gph and wondered how this was produced, you can simply look up the code that is stored in the file’s meta data.

Move Stata to the directory in which the .gph file is stored in and write

gs_fileinfo mygraph.gph di r(command)`

This code will produce the full code used to create the figure in Stata’s results window