We still wait 120 seconds for cert fetches from missing dir mirrors
In #4483 (moved) and prop210 we set up an elaborate download schedule for consistently reaching fallbackdirs when fetching the consensus, so we don't end up just sitting there for 120 seconds while a tcp connection waits (and eventually the SocksTimeout parameter is reached and we move on).
But we didn't do any similar thing with fetching the key certs. I just had my bootstrap go smoothly through the #4483 (moved) features (with the fixes from #18809 (moved)) and then it stalled for 2 minutes trying to fetch the certs from a fallbackdir that's offline.
Sure enough, in authority_certs_fetch_missing() I see
/* XXX - do we want certs from authorities or mirrors? - teor */
directory_get_from_dirserver(DIR_PURPOSE_FETCH_CERTIFICATE, 0,
resource, PDS_RETRY_IF_NO_SERVERS,
DL_WANT_ANY_DIRSERVER);
So teor noticed this one too.
I think in 0.2.8, if we leave the fallbackdir stuff in (meaning we merge #18809 (moved) or equivalent into 0.2.8), we could bandage this one by changing DL_WANT_ANY_DIRSERVER to DL_WANT_AUTHORITY, and then it wouldn't be much worse than it is now (in terms of performance -- we would indeed lose the ability to bootstrap from scratch when the authorities are unavailable).
Longer term (0.2.9 and later), I think we should explore a) having directory_get_from_dirserver() notice that there are tls conns established to dir mirrors that we just recently used (and prefer them), or b) trying to explicitly remember the dir mirror that gave us the consensus and re-use it, and/or c) designing a piggy-back mechanism so we can ask for "the certs that go with this consensus" when we're fetching a consensus and we know we will want the certs for it too (thus saving a round-trip).