Never Lose a Stata Session Again: Auto-Logging via profile.do
Published:
If you’ve ever stared at a screen full of regression output and thought “I really hope I remember to save this log,” this post is for you. Stata’s log using is great when you remember to type it. The trouble is, you don’t — and three weeks later you realize the only record of that one regression you can’t reproduce was scrolled off the terminal.
Here’s a five-minute setup that auto-captures every Stata session, batch or interactive, into a uniquely named file. No mental load, no overwrites, no lost work.
The idea
Stata reads a file called profile.do from your personal ado directory whenever it starts. Whatever you put in profile.do runs automatically at the beginning of every session. So we put a log using command in there.
But two snags:
- Filenames must be unique. Otherwise repeated runs clobber each other.
- Multiple sessions can start in the same second. If you launch ten batch jobs in parallel, a timestamp-only filename collides.
The fix is a timestamp plus a six-digit random tag. The (time, random) tuple is essentially unique even under heavy parallelism.
The setup
First, find your personal ado directory:
sysdir
Look for the line beginning PERSONAL:. It’s typically ~/ado/personal/. Make sure the directory exists, plus a subdirectory for the logs:
mkdir -p ~/ado/personal/stata_logs
Then drop this file at ~/ado/personal/profile.do:
* ============================================================================
* profile.do -- runs automatically at Stata startup.
* Persistent session log: every interactive or batch session writes to
* a uniquely-named file under $log_dir. Filename format:
* YYYY_M_D_HMS_RRRRRR.log
* where RRRRRR is a 6-digit random tag drawn from /dev/urandom.
* ============================================================================
global log_dir "~/ado/personal/stata_logs"
capture mkdir "$log_dir"
* ---- Build the YYYY_M_D_HMS timestamp ----
local wjm = subinstr(subinstr("`c(current_date)'", ":", "", .), " ", "", .)
local year = year(date("`wjm'", "DMY"))
local month = month(date("`wjm'", "DMY"))
local day = day(date("`wjm'", "DMY"))
local sj = subinstr(subinstr("`c(current_time)'", ":", "", .), " ", "", .)
local stamp = "`year'_`month'_`day'_`sj'"
* ---- 6-digit random suffix from /dev/urandom ----
* Disambiguates concurrent sessions started in the same second.
tempfile rndfile
capture !od -An -N4 -tu4 /dev/urandom 2>/dev/null | tr -d ' \n' > "`rndfile'"
local rnd_str = "000000"
capture file open rh using "`rndfile'", read
if _rc == 0 {
capture file read rh rndline
capture file close rh
capture local rnd = mod(real("`rndline'"), 1000000)
if !missing(`rnd') {
local rnd_str = string(`rnd', "%06.0f")
}
}
* ---- Open the session log ----
log using "$log_dir/`stamp'_`rnd_str'.log", text
di "Log started: $log_dir/`stamp'_`rnd_str'.log"
That’s it. Nothing else to remember. Every Stata session — every do script, every interactive session, every batch run — quietly deposits a complete transcript into ~/ado/personal/stata_logs/.
Verifying it works
Start two Stata batch processes simultaneously:
echo 'di "session A"' | stata-mp >/dev/null 2>&1 &
echo 'di "session B"' | stata-mp >/dev/null 2>&1 &
wait
ls -t ~/ado/personal/stata_logs/ | head -2
You should see two distinct files, both stamped with the same second but different six-digit tails:
2026_5_5_074739_365771.log
2026_5_5_074739_749474.log
Open either: it contains the full session output, ready for grep.
Why the random tag matters
/dev/urandom gives 32 bits of entropy per draw, modded down to a six-digit integer. The collision probability between any two sessions starting in the same second is roughly $1/10^6$. If you launch a hundred jobs at once, the probability that any pair collides is around $\binom{100}{2} / 10^6 \approx 0.5\%$, and the losing job will get a “file exists” error from Stata rather than silently overwriting. A clean failure mode.
If you wanted higher safety, you could draw eight digits instead of six (change 1000000 to 100000000 and %06.0f to %08.0f). Six is fine for human-scale parallelism.
Why no , replace
I deliberately omitted the replace flag on log using. Filenames are unique by construction, so replace would only ever fire on a genuine collision — at which point I’d rather see Stata error out than silently overwrite a sibling job’s log. The cost of being notified about a once-in-blue-moon collision is much lower than the cost of losing data.
What it gives you
- Zero-friction reproducibility. Every regression coefficient you’ve ever run on this machine is grep-able.
- Audit trail under parallelism. If you launch ten regression scripts at the same time, all ten produce distinct logs.
- No naming discipline required. You can name your
dofiles whatever you like; the session log is named by the runtime, not the script.
Caveats
- Logs accumulate. If you generate hundreds per week, gzip files older than 60 days from a cron — straightforward.
profile.doruns at startup, so if it errors (say, your log directory is missing and your filesystem is read-only) Stata prints a warning and continues without logging. Thecaptureguards prevent a fatal abort.- Project-local
dofiles that themselves calllog usingwill open a second nested log; Stata supports up to five concurrent logs and treats them independently. Your auto-log keeps running in the background.
I’ve been using this for a while now and the cognitive relief is real. Type do my_messy_analysis.do knowing that whatever happens next is on disk.
Drafted as a working note while setting up a fresh research machine. The code above is plug-and-play; copy it into ~/ado/personal/profile.do and your next Stata session is logged automatically.
