Performance
fastdid
is magnitudes faster than did, and 15x faster than
the fastest alternative DiDforBigData (dfbd
for short) for large dataset. fastdid also uses less memory. Here is a
comparison of run time for fastdid, did, and dfbd using a panel of 10
periods and varying samples sizes.
Unfortunately, the Author’s computer fails to run did at 1 million units. For a rough idea, DiDforBigData is about 100x faster than did in Bradley Setzler’s benchmark. Other staggered DiD implementations are even slower than did.
For memory:
For the benchmark, a baseline group-time ATT is estimated with no
covariates control, no bootstrap, and no explicit parallelization.
Computing time is measured by microbenchmark
and peak RAM
by peakRAM
.
Validitiy
Before each release, we conduct tests to ensure the validity of
estimates from fastdid
.
Basics: comparison with did
For features included in CS, fastdid
maintains a maximum
of 1% difference from results from the did
package. This
margin of error is mostly for bootstrapped results due to its inherent
randomess. For point estimates, the difference is smaller than 1e-12,
and is most likely the result of floating-point
error. The relevant test files are compare_est.R.
Extensions: coverage test
For features not included in CS, fastdid
maintains that
the 95% confidence intervals have a coverage rate between 94% and
96%.
The coverage rate is calculated by running 200 iterations. In each iteration, we test whether the confidence interval estimated covers the group-truth values. We then average the rate across iterations. Due to the randomness of coverage, the realized coverage fall outside of the thresholds in about 1% of the time. The relevant test file is coverage.R.
Experimental Features: not tested
As an attempt to balance the validity and flexibility of
fastdid
, “experimental features” is introduced in version
0.9.4. These features will be less tested and documented, and it is
generally advised to not use them unless the user know what they and the
package are doing. These experimental features can be accessed via the
exper
argument. For example, to use the
filtervar
feature, call
fastdid(..., exper = list(filtervar = "FF"))
.
The current list of experimental features are
-
max_control_cohort_diff
: limit the max cohort difference between treated and control group -
filtervar
,filtervar_post
: limit the units being used as treated and control group with a potentially-time-varying variable in base (post) period -
only_balance_2by2
: only require observations to have non-NA values within each 2 by 2 DiD, instead of throughout all time periods. Can be an alternative way of dealing with unbalanced panel by filling the missing periods with NAs. Not recommended as CS only haveallow_unbalance_panel
, which uses a repeated cross-section 2 by 2 DiD estimator. -
custom_scheme
: aggregate to user-defined parameters
Comparison with did
As the name suggests, fastdid’s goal is to be fast did. Besides performance, here are some comparisons between the two packages.
Estimator
fastdid’s estimators is identical to
did’s. As the performance gains mostly come from
efficient data manipulation, the key estimation implementations are
analogous. For example, 2x2 DiD (estimate_did.R
and
DRDID::std_ipw_did_panel
), influence function from weights
(aggregate_gt.R/get_weight_influence
,
compute.aggte.R/wif
), and multiplier bootstrap
(get_se.R
and mboot.R
).
Interface
fastdid should feel similar to att_gt
.
But there are a few differences:
Control group option:
fastdid | did | control group used |
---|---|---|
both | notyettreated | never-treated + not-yet-but-eventually-treated |
never | nevertreated | never-treated |
notyet | not-yet-but-eventually-treated |
Aggregated parameters: fastdid
aggregates in the same
function.
fastdid | did |
---|---|
group_time | no aggregation |
dynamic | dynamic |
time | calendar |
group | group |
simple | simple |