Data and Statistics Packages

Overview

This lecture explores some of the key packages for working with data and doing statistics in Julia.

In particular, we will examine the DataFrame object in detail (i.e., construction, manipulation, querying, visualization, and nuances like missing data).

While Julia is not an ideal language for pure cookie-cutter statistical analysis, it has many useful packages to provide those tools as part of a more general solution.

This list is not exhaustive, and others can be found in organizations such as JuliaStats, JuliaData, and QueryVerse.

Setup

In [1]:
using InstantiateFromURL
# optionally add arguments to force installation: instantiate = true, precompile = true
github_project("QuantEcon/quantecon-notebooks-julia", version = "0.8.0")
In [2]:
using LinearAlgebra, Statistics
using DataFrames, RDatasets, DataFramesMeta, CategoricalArrays, Query, VegaLite
using GLM

DataFrames

A useful package for working with data is DataFrames.jl.

The most important data type provided is a DataFrame, a two dimensional array for storing heterogeneous data.

Although data can be heterogeneous within a DataFrame, the contents of the columns must be homogeneous (of the same type).

This is analogous to a data.frame in R, a DataFrame in Pandas (Python) or, more loosely, a spreadsheet in Excel.

There are a few different ways to create a DataFrame.

Constructing and Accessing a DataFrame

The first is to set up columns and construct a dataframe by assigning names

In [3]:
using DataFrames, RDatasets  # RDatasets provides good standard data examples from R

# note use of missing
commodities = ["crude", "gas", "gold", "silver"]
last_price = [4.2, 11.3, 12.1, missing]
df = DataFrame(commod = commodities, price = last_price)
┌ Info: Precompiling DataFrames [a93c6f00-e57d-5684-b7b6-d8193f3e46c0]
└ @ Base loading.jl:1260
ERROR: LoadError: LoadError: UndefVarError: tostr_sizehint not defined
Stacktrace:
 [1] getproperty(::Module, ::Symbol) at ./Base.jl:26
 [2] top-level scope at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/deprecated.jl:101
 [3] include(::Module, ::String) at ./Base.jl:377
 [4] include(::String) at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/CategoricalArrays.jl:1
 [5] top-level scope at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/CategoricalArrays.jl:39
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] top-level scope at none:2
 [8] eval at ./boot.jl:331 [inlined]
 [9] eval(::Expr) at ./client.jl:449
 [10] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/deprecated.jl:101
in expression starting at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/CategoricalArrays.jl:39
ERROR: LoadError: Failed to precompile CategoricalArrays [324d7699-5711-5eae-9e2f-1d82baa6b597] to /home/ubuntu/.julia/compiled/v1.4/CategoricalArrays/RHXoP_Os1d1.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] top-level scope at none:2
 [8] eval at ./boot.jl:331 [inlined]
 [9] eval(::Expr) at ./client.jl:449
 [10] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/DataFrames/utxEh/src/DataFrames.jl:10
Failed to precompile DataFrames [a93c6f00-e57d-5684-b7b6-d8193f3e46c0] to /home/ubuntu/.julia/compiled/v1.4/DataFrames/AR9oZ_Os1d1.ji.

Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922
 [6] top-level scope at In[3]:1

Columns of the DataFrame can be accessed by name using df.col, as below

In [4]:
df.price
UndefVarError: df not defined

Stacktrace:
 [1] top-level scope at In[4]:1

Note that the type of this array has values Union{Missing, Float64} since it was created with a missing value.

In [5]:
df.commod
UndefVarError: df not defined

Stacktrace:
 [1] top-level scope at In[5]:1

The DataFrames.jl package provides a number of methods for acting on DataFrame’s, such as describe.

In [6]:
DataFrames.describe(df)
UndefVarError: DataFrames not defined

Stacktrace:
 [1] top-level scope at In[6]:1

While often data will be generated all at once, or read from a file, you can add to a DataFrame by providing the key parameters.

In [7]:
nt = (commod = "nickel", price= 5.1)
push!(df, nt)
UndefVarError: df not defined

Stacktrace:
 [1] top-level scope at In[7]:2

Named tuples can also be used to construct a DataFrame, and have it properly deduce all types.

In [8]:
nt = (t = 1, col1 = 3.0)
df2 = DataFrame([nt])
push!(df2, (t=2, col1 = 4.0))
UndefVarError: DataFrame not defined

Stacktrace:
 [1] top-level scope at In[8]:2

In order to modify a column, access the mutating version by the symbol df[!, :col].

In [9]:
df[!, :price]
UndefVarError: df not defined

Stacktrace:
 [1] top-level scope at In[9]:1

Which allows modifications, like other mutating ! functions in julia.

In [10]:
df[!, :price] *= 2.0  # double prices
UndefVarError: df not defined

Stacktrace:
 [1] top-level scope at In[10]:1

As discussed in the next section, note that the fundamental types, is propagated, i.e. missing * 2 === missing.

Working with Missing

As we discussed in fundamental types, the semantics of missing are that mathematical operations will not silently ignore it.

In order to allow missing in a column, you can create/load the DataFrame from a source with missing’s, or call allowmissing! on a column.

In [11]:
allowmissing!(df2, :col1) # necessary to add in a for col1
push!(df2, (t=3, col1 = missing))
push!(df2, (t=4, col1 = 5.1))
UndefVarError: allowmissing! not defined

Stacktrace:
 [1] top-level scope at In[11]:1

We can see the propagation of missing to caller functions, as well as a way to efficiently calculate with non-missing data.

In [12]:
@show mean(df2.col1)
@show mean(skipmissing(df2.col1))
UndefVarError: df2 not defined

Stacktrace:
 [1] top-level scope at show.jl:613
 [2] top-level scope at In[12]:1

And to replace the missing

In [13]:
df2.col1  .= coalesce.(df2.col1, 0.0) # replace all missing with 0.0
UndefVarError: df2 not defined

Stacktrace:
 [1] top-level scope at In[13]:1

Manipulating and Transforming DataFrames

One way to do an additional calculation with a DataFrame is to tuse the @transform macro from DataFramesMeta.jl.

In [14]:
using DataFramesMeta
f(x) = x^2
df2 = @transform(df2, col2 = f.(:col1))
┌ Info: Precompiling DataFramesMeta [1313f7d8-7da2-5740-9ea0-a2ca25f37964]
└ @ Base loading.jl:1260
ERROR: LoadError: LoadError: UndefVarError: tostr_sizehint not defined
Stacktrace:
 [1] getproperty(::Module, ::Symbol) at ./Base.jl:26
 [2] top-level scope at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/deprecated.jl:101
 [3] include(::Module, ::String) at ./Base.jl:377
 [4] include(::String) at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/CategoricalArrays.jl:1
 [5] top-level scope at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/CategoricalArrays.jl:39
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] top-level scope at none:2
 [8] eval at ./boot.jl:331 [inlined]
 [9] eval(::Expr) at ./client.jl:449
 [10] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/deprecated.jl:101
in expression starting at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/CategoricalArrays.jl:39
ERROR: LoadError: Failed to precompile CategoricalArrays [324d7699-5711-5eae-9e2f-1d82baa6b597] to /home/ubuntu/.julia/compiled/v1.4/CategoricalArrays/RHXoP_Os1d1.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] top-level scope at none:2
 [8] eval at ./boot.jl:331 [inlined]
 [9] eval(::Expr) at ./client.jl:449
 [10] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/DataFrames/utxEh/src/DataFrames.jl:10
ERROR: LoadError: Failed to precompile DataFrames [a93c6f00-e57d-5684-b7b6-d8193f3e46c0] to /home/ubuntu/.julia/compiled/v1.4/DataFrames/AR9oZ_Os1d1.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] top-level scope at none:2
 [8] eval at ./boot.jl:331 [inlined]
 [9] eval(::Expr) at ./client.jl:449
 [10] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/DataFramesMeta/oLnYB/src/DataFramesMeta.jl:3
Failed to precompile DataFramesMeta [1313f7d8-7da2-5740-9ea0-a2ca25f37964] to /home/ubuntu/.julia/compiled/v1.4/DataFramesMeta/2xRTO_Os1d1.ji.

Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922
 [6] top-level scope at In[14]:1

Categorical Data

For data that is categorical

In [15]:
using CategoricalArrays
id = [1, 2, 3, 4]
y = ["old", "young", "young", "old"]
y = CategoricalArray(y)
df = DataFrame(id=id, y=y)
┌ Info: Precompiling CategoricalArrays [324d7699-5711-5eae-9e2f-1d82baa6b597]
└ @ Base loading.jl:1260
ERROR: LoadError: LoadError: UndefVarError: tostr_sizehint not defined
Stacktrace:
 [1] getproperty(::Module, ::Symbol) at ./Base.jl:26
 [2] top-level scope at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/deprecated.jl:101
 [3] include(::Module, ::String) at ./Base.jl:377
 [4] include(::String) at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/CategoricalArrays.jl:1
 [5] top-level scope at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/CategoricalArrays.jl:39
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] top-level scope at none:2
 [8] eval at ./boot.jl:331 [inlined]
 [9] eval(::Expr) at ./client.jl:449
 [10] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/deprecated.jl:101
in expression starting at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/CategoricalArrays.jl:39
Failed to precompile CategoricalArrays [324d7699-5711-5eae-9e2f-1d82baa6b597] to /home/ubuntu/.julia/compiled/v1.4/CategoricalArrays/RHXoP_Os1d1.ji.

Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922
 [6] top-level scope at In[15]:1
In [16]:
levels(df.y)
UndefVarError: df not defined

Stacktrace:
 [1] top-level scope at In[16]:1

Visualization, Querying, and Plots

The DataFrame (and similar types that fulfill a standard generic interface) can fit into a variety of packages.

One set of them is the QueryVerse.

Note: The QueryVerse, in the same spirit as R’s tidyverse, makes heavy use of the pipeline syntax |>.

In [17]:
x = 3.0
f(x) = x^2
g(x) = log(x)

@show g(f(x))
@show x |> f |> g; # pipes nest function calls
g(f(x)) = 2.1972245773362196
(x |> f) |> g = 2.1972245773362196

To give an example directly from the source of the LINQ inspired Query.jl

In [18]:
using Query

df = DataFrame(name=["John", "Sally", "Kirk"], age=[23., 42., 59.], children=[3,5,2])

x = @from i in df begin
    @where i.age>50
    @select {i.name, i.children}
    @collect DataFrame
end
┌ Info: Precompiling Query [1a8c2f83-1ff3-5112-b086-8aa67b057ba1]
└ @ Base loading.jl:1260
UndefVarError: DataFrame not defined

Stacktrace:
 [1] top-level scope at In[18]:2

While it is possible to just use the Plots.jl library, there may be better options for displaying tabular data – such as VegaLite.jl.

In [19]:
using RDatasets, VegaLite
iris = dataset("datasets", "iris")

iris |> @vlplot(
    :point,
    x=:PetalLength,
    y=:PetalWidth,
    color=:Species
)
┌ Info: Precompiling RDatasets [ce6b1742-4840-55fa-b093-852dadbb1d8b]
└ @ Base loading.jl:1260
ERROR: LoadError: LoadError: UndefVarError: tostr_sizehint not defined
Stacktrace:
 [1] getproperty(::Module, ::Symbol) at ./Base.jl:26
 [2] top-level scope at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/deprecated.jl:101
 [3] include(::Module, ::String) at ./Base.jl:377
 [4] include(::String) at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/CategoricalArrays.jl:1
 [5] top-level scope at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/CategoricalArrays.jl:39
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] top-level scope at none:2
 [8] eval at ./boot.jl:331 [inlined]
 [9] eval(::Expr) at ./client.jl:449
 [10] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/deprecated.jl:101
in expression starting at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/CategoricalArrays.jl:39
ERROR: LoadError: Failed to precompile CategoricalArrays [324d7699-5711-5eae-9e2f-1d82baa6b597] to /home/ubuntu/.julia/compiled/v1.4/CategoricalArrays/RHXoP_Os1d1.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] top-level scope at none:2
 [8] eval at ./boot.jl:331 [inlined]
 [9] eval(::Expr) at ./client.jl:449
 [10] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/DataFrames/utxEh/src/DataFrames.jl:10
ERROR: LoadError: Failed to precompile DataFrames [a93c6f00-e57d-5684-b7b6-d8193f3e46c0] to /home/ubuntu/.julia/compiled/v1.4/DataFrames/AR9oZ_Os1d1.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] top-level scope at none:2
 [8] eval at ./boot.jl:331 [inlined]
 [9] eval(::Expr) at ./client.jl:449
 [10] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/RData/6wYre/src/RData.jl:3
ERROR: LoadError: Failed to precompile RData [df47a6cb-8c03-5eed-afd8-b6050d6c41da] to /home/ubuntu/.julia/compiled/v1.4/RData/idMMA_Os1d1.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] top-level scope at none:2
 [8] eval at ./boot.jl:331 [inlined]
 [9] eval(::Expr) at ./client.jl:449
 [10] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/RDatasets/1Ih8s/src/RDatasets.jl:2
Failed to precompile RDatasets [ce6b1742-4840-55fa-b093-852dadbb1d8b] to /home/ubuntu/.julia/compiled/v1.4/RDatasets/JyIbx_Os1d1.ji.

Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922
 [6] top-level scope at In[19]:1

Statistics and Econometrics

While Julia is not intended as a replacement for R, Stata, and similar specialty languages, it has a growing number of packages aimed at statistics and econometrics.

Many of the packages live in the JuliaStats organization.

A few to point out

  • StatsBase has basic statistical functions such as geometric and harmonic means, auto-correlations, robust statistics, etc.
  • StatsFuns has a variety of mathematical functions and constants such as pdf and cdf of many distributions, softmax, etc.

General Linear Models

To run linear regressions and similar statistics, use the GLM package.

In [20]:
using GLM

x = randn(100)
y = 0.9 .* x + 0.5 * rand(100)
df = DataFrame(x=x, y=y)
ols = lm(@formula(y ~ x), df) # R-style notation
┌ Info: Precompiling GLM [38e38edf-8417-5370-95a0-9cbb8c7f171a]
└ @ Base loading.jl:1260
ERROR: LoadError: LoadError: UndefVarError: tostr_sizehint not defined
Stacktrace:
 [1] getproperty(::Module, ::Symbol) at ./Base.jl:26
 [2] top-level scope at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/deprecated.jl:101
 [3] include(::Module, ::String) at ./Base.jl:377
 [4] include(::String) at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/CategoricalArrays.jl:1
 [5] top-level scope at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/CategoricalArrays.jl:39
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] top-level scope at none:2
 [8] eval at ./boot.jl:331 [inlined]
 [9] eval(::Expr) at ./client.jl:449
 [10] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/deprecated.jl:101
in expression starting at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/CategoricalArrays.jl:39
ERROR: LoadError: Failed to precompile CategoricalArrays [324d7699-5711-5eae-9e2f-1d82baa6b597] to /home/ubuntu/.julia/compiled/v1.4/CategoricalArrays/RHXoP_Os1d1.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] top-level scope at none:2
 [8] eval at ./boot.jl:331 [inlined]
 [9] eval(::Expr) at ./client.jl:449
 [10] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/DataFrames/utxEh/src/DataFrames.jl:10
ERROR: LoadError: Failed to precompile DataFrames [a93c6f00-e57d-5684-b7b6-d8193f3e46c0] to /home/ubuntu/.julia/compiled/v1.4/DataFrames/AR9oZ_Os1d1.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] top-level scope at none:2
 [8] eval at ./boot.jl:331 [inlined]
 [9] eval(::Expr) at ./client.jl:449
 [10] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/StatsModels/AYB2E/src/StatsModels.jl:6
ERROR: LoadError: Failed to precompile StatsModels [3eaba693-59b7-5ba5-a881-562e759f1c8d] to /home/ubuntu/.julia/compiled/v1.4/StatsModels/4MFnV_Os1d1.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] top-level scope at none:2
 [8] eval at ./boot.jl:331 [inlined]
 [9] eval(::Expr) at ./client.jl:449
 [10] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/GLM/0c65q/src/GLM.jl:7
Failed to precompile GLM [38e38edf-8417-5370-95a0-9cbb8c7f171a] to /home/ubuntu/.julia/compiled/v1.4/GLM/6OREG_Os1d1.ji.

Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922
 [6] top-level scope at In[20]:1

To display the results in a useful tables for LaTeX and the REPL, use RegressionTables for output similar to the Stata package esttab and the R package stargazer.

In [21]:
using RegressionTables
regtable(ols)
# regtable(ols,  renderSettings = latexOutput()) # for LaTex output
┌ Info: Precompiling RegressionTables [d519eb52-b820-54da-95a6-98e1306fdade]
└ @ Base loading.jl:1260
ERROR: LoadError: LoadError: UndefVarError: tostr_sizehint not defined
Stacktrace:
 [1] getproperty(::Module, ::Symbol) at ./Base.jl:26
 [2] top-level scope at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/deprecated.jl:101
 [3] include(::Module, ::String) at ./Base.jl:377
 [4] include(::String) at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/CategoricalArrays.jl:1
 [5] top-level scope at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/CategoricalArrays.jl:39
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] top-level scope at none:2
 [8] eval at ./boot.jl:331 [inlined]
 [9] eval(::Expr) at ./client.jl:449
 [10] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/deprecated.jl:101
in expression starting at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/CategoricalArrays.jl:39
ERROR: LoadError: Failed to precompile CategoricalArrays [324d7699-5711-5eae-9e2f-1d82baa6b597] to /home/ubuntu/.julia/compiled/v1.4/CategoricalArrays/RHXoP_Os1d1.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] top-level scope at none:2
 [8] eval at ./boot.jl:331 [inlined]
 [9] eval(::Expr) at ./client.jl:449
 [10] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/DataFrames/utxEh/src/DataFrames.jl:10
ERROR: LoadError: Failed to precompile DataFrames [a93c6f00-e57d-5684-b7b6-d8193f3e46c0] to /home/ubuntu/.julia/compiled/v1.4/DataFrames/AR9oZ_Os1d1.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] top-level scope at none:2
 [8] eval at ./boot.jl:331 [inlined]
 [9] eval(::Expr) at ./client.jl:449
 [10] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/StatsModels/AYB2E/src/StatsModels.jl:6
ERROR: LoadError: Failed to precompile StatsModels [3eaba693-59b7-5ba5-a881-562e759f1c8d] to /home/ubuntu/.julia/compiled/v1.4/StatsModels/4MFnV_Os1d1.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] top-level scope at none:2
 [8] eval at ./boot.jl:331 [inlined]
 [9] eval(::Expr) at ./client.jl:449
 [10] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/RegressionTables/rDg5b/src/RegressionTables.jl:33
Failed to precompile RegressionTables [d519eb52-b820-54da-95a6-98e1306fdade] to /home/ubuntu/.julia/compiled/v1.4/RegressionTables/cYvie_Os1d1.ji.

Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922
 [6] top-level scope at In[21]:1

Fixed Effects

While Julia may be overkill for estimating a simple linear regression, fixed-effects estimation with dummies for multiple variables are much more computationally intensive.

For a 2-way fixed-effect, taking the example directly from the documentation using cigarette consumption data

In [22]:
using FixedEffectModels
cigar = dataset("plm", "Cigar")
cigar.StateCategorical =  categorical(cigar.State)
cigar.YearCategorical =  categorical(cigar.Year)
fixedeffectresults = reg(cigar, @formula(Sales ~ NDI + fe(StateCategorical) + fe(YearCategorical)),
                            weights = :Pop, Vcov.cluster(:State))
regtable(fixedeffectresults)
┌ Info: Precompiling FixedEffectModels [9d5cd8c9-2029-5cab-9928-427838db53e3]
└ @ Base loading.jl:1260
ERROR: LoadError: LoadError: UndefVarError: tostr_sizehint not defined
Stacktrace:
 [1] getproperty(::Module, ::Symbol) at ./Base.jl:26
 [2] top-level scope at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/deprecated.jl:101
 [3] include(::Module, ::String) at ./Base.jl:377
 [4] include(::String) at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/CategoricalArrays.jl:1
 [5] top-level scope at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/CategoricalArrays.jl:39
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] top-level scope at none:2
 [8] eval at ./boot.jl:331 [inlined]
 [9] eval(::Expr) at ./client.jl:449
 [10] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/deprecated.jl:101
in expression starting at /home/ubuntu/.julia/packages/CategoricalArrays/qcwgl/src/CategoricalArrays.jl:39
ERROR: LoadError: Failed to precompile CategoricalArrays [324d7699-5711-5eae-9e2f-1d82baa6b597] to /home/ubuntu/.julia/compiled/v1.4/CategoricalArrays/RHXoP_Os1d1.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] top-level scope at none:2
 [8] eval at ./boot.jl:331 [inlined]
 [9] eval(::Expr) at ./client.jl:449
 [10] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/FixedEffectModels/J5qW4/src/FixedEffectModels.jl:20
Failed to precompile FixedEffectModels [9d5cd8c9-2029-5cab-9928-427838db53e3] to /home/ubuntu/.julia/compiled/v1.4/FixedEffectModels/XFTup_Os1d1.ji.

Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922
 [6] top-level scope at In[22]:1