Fined-grain timer implementation to support per-connection or per-circuit timers

changed milestone to %Tor: 0.2.9.x-final

added TorCoreTeam-postponed-201604 TorCoreTeam201605 actualpoints::medium backend component::core tor/tor milestone::Tor: 0.2.9.x-final owner::nickm performance points::3 priority::medium resolution::implemented reviewer::mikeperry severity::normal status::closed type::enhancement labels

Trac:
Status: new to accepted
Owner: N/A to nickm

Trac:
Priority: High to Medium

These are infrastructure that will help out other projects, and should go in early. Marking them for April.

Trac:
Keywords: performance backend deleted, TorCoreTeam201604, performance, backend added

Checkpointing an implementation in branch timeouts. Needs tests.

Trac:
Reviewer: N/A to N/A

Now with tests. Needs review!

(Calling this "actual-points medium" to reflect the work I did testing and patching the upstream code. The branch itself was "small".)

Trac:
Status: accepted to needs_review
Actualpoints: N/A to medium

Now see timeouts_v2, rebased and squashed a little.

Now it has complete test coverage.

Begin code review for nickm's timeouts_v2 branch:

32e80ea3d32d5fd8207d16f9e5b26defa0d98a7c:

No detailed review of this commit as this is an import of external code
One potential concern: we need to be pretty clear whether the times we use here are meant to be monotonic times or real-time clock times; it doesn't look like timeout.c makes any direct time syscalls itself, just lets the caller supply the current time as a uint64_t, which gives us flexibility there, but it will be necessary to be consistent within any given struct timeouts. Possibly some admonitory documentation is in order somewhere.
- Okay, in d0638fe760363f9c040256ac2884234ddad1d384 it looks like we're committing to monotonic time in libevent_timer_reschedule(). Consider this concern resolved.

05499b6ded25b5cbc8b16916fa9c0a39407ab10f:

Straightforward makefile changes; this looks fine

9d6c530015e4eefd7b333885eaeca1f9fcbc6578:

Stop compiler warnings for timeout.c; looks okay, but are we sure we got everything for every possible case we end up building?
Same for cbf47612b737a6ad31f17084ef5c36f5ebe33a76; some nervousness about all these bitmasks and casts.

c77cf8825a33d902c5827f0b4f0a71cec97a3a85:

This looks pretty straightforward and unobjectionable.

d0638fe760363f9c040256ac2884234ddad1d384:

Is using a single pool of timeouts for both absolute and relative values with tor_gettimeofday_cached_monotonic() entirely wise?
- From the comment for that function:

     * In future versions of Tor, this may return a time does not have its
     * origin at the Unix epoch.

 This, of course, is intended to allow for a future change to
 clock_gettime(CLOCK_MONOTONIC, ...), which will provide smoother
 behavior in the case the clock actually does run backwards than
 the current remember-and-compare implementation, but is not guaranteed
 to have any particular origin.  Using that where available is clearly
 the most reliable way to handle relative timeouts, but wrong for absolute
 ones, and the function we're using is defined to allow us room to make
 that change in the future.

Do we actually have any examples of needing to fire a particular callback exactly once at or after a particular wall clock time as these absolute timeouts provide?
If we do need absolute timeouts, we should think about splitting things up into two timeouts objects rather than just a unified global_timeouts; if relative timeouts are monotonic and absolute timeouts are based on clock time, then resetting time potentially changes the order of future timeouts without any timeout having occurred. Then we're free to use the appropriate definition of time separately for each.

f7a6f142c67a3d256d522d1cfa66e525d16d6ab7:

These tests look just fine.

cbf47612b737a6ad31f17084ef5c36f5ebe33a76:

See comments for 9d6c530015e4eefd7b333885eaeca1f9fcbc6578
This particular cast in timeouts_del(), for example, will break on a machine with 64-bit pointers and 32-bit ints if WHEEL_LEN is very large.
- timeout.c does document size constraints on these parameters such that this should work on any platform with integer sizes consistent with ISO C, though.

b9e0f7c91623e5ebde774bab61030f04b808c024:

More tests; looks okay.

d3a2b9e836cfed39f64963b171be152a7ae8ff4b:

Changes file looks fine

In conclusion, this needs more thought about relative vs. absolute timeouts I think. Should I review the imported timeout.c code itself separately, or is this something solid and widely used enough we aren't overly worried about it?

Trac:
Status: needs_review to needs_revision

I don't think we currently have an application for the absolute times, so I think I'll just rip 'em out.

As for the upstream code -- I've already had a pretty heavy look over it, and I got it up to 100% test coverage, but another look couldn't hurt.

When running the timers test under Valgrind it ends up in one of the warning states. Valgrind then starts reporting there are memory leaks because these states return from main without cleaning up. See the following patch for a possible fix.

diff --git a/src/test/test-timers.c b/src/test/test-timers.c
index 2896e49..35906a1 100644
--- a/src/test/test-timers.c
+++ b/src/test/test-timers.c
@@ -59,6 +59,7 @@ main(int argc, char **argv)
   timers_initialize();
 
   int i;
+  int ret;
   struct timeval now;
   tor_gettimeofday(&now);
   for (i = 0; i < N_TIMERS; ++i) {
@@ -109,13 +110,14 @@ main(int argc, char **argv)
   if (mean_diff > 500*1000 || stddev > 500*1000) {
     printf("Either your system is under ridiculous load, or the "
            "timer backend is broken.\n");
-    return 1;
+    ret = 1;
   } else if (mean_diff > 2000 || stddev > 2000) {
     printf("Either your system is a bit slow or the "
            "timer backend is odd.\n");
-    return 0;
+    ret = 0;
   } else {
     printf("Looks good enough.\n");
+    ret = 0;
   }
 
   timer_free(NULL);
@@ -124,5 +126,5 @@ main(int argc, char **argv)
     timer_free(timers[i]);
   }
   timers_shutdown();
-  return 0;
+  return ret;
 }

One nitpick are the magic values in the if statements. Maybe use descriptive defines for these?

Removed absolute timers, applied above patch, added defines. Also realized that differences can be negative (in theory though I hope never in practice) and fixed that. Still in timeouts_v2

Trac:
Status: needs_revision to needs_review

I've done a quick high-level review of the src/common/timers.c, and I'm concerned about the relative timers and cached monotonic time. If the timers themselves use monotonic time as we implement it, then we're vulnerable to NTP spoofing where the adversary causes a negative clock jump to delay all timers. Using cached monotonic time for timers means that all timers scheduled after that clock jump will have to wait for the duration of the clock jump before the cached monotonic clock moves forward again to cover the clock jump delta and then trigger the callbacks. This means that padding will be effectively disabled for the full time delta of such negative clock jumps.

For the netflow padding, I used uncached gettimeofday and added some sanity checks for clock jumps based on expected min and max timeout values and channel padding inspection frequency. Not perfect, but at least we can check and warn when it happens, and schedule the padding immediately.

I talked a bit with Yawning, and we think that using performance counters (CLOCK_MONOTONIC_RAW on Linux and QueryPerformanceCounter on Windows) would work here, but then we'd have to alter all of the current channel timestamps, netflow timeout values, and this timer code to use those for scheduling and callback. This may actually be a good idea anyway, since we only need "consensus reality" time for the Tor consensus and descriptor freshness checks. Actual networking and timer stuff can and should be realtime and agnostic about the epoch.

Otherwise, I think it might actually be better for padding to use uncached tor_gettimeofday() directly, and check for jumps. Obviously this is error-prone and problematic, though, but I think better than cached monotonic time for the traffic analysis and padding use case.

Trac:
Cc: N/A to yawning

Let's assume that we take this one with a separate ticket called "make cached_monotonic_time better"?

I've added three tickets about improving tor_gettimeofday_cached_monotonic: #18906 (moved), #18907 (moved), #18908 (moved).

Changes in timeouts_v2 look fine to me; I am wholeheartedly in favor of using a better monotonic time source where available and doing so consistently throughout the codebase.

April is over; calling these april tickets postponed into may.

Trac:
Keywords: TorCoreTeam201604 deleted, TorCoreTeam-postponed-201604, TorCoreTeam201605 added

Trac:
Reviewer: N/A to mikeperry,athena
Status: needs_review to merge_ready

Squashed and merged.

Trac:
Status: merge_ready to closed
Resolution: N/A to implemented

I made some comments on IRC last week that I still do not feel were addressed.

These issues are:

If time is cached, then we will always have a delta from when the timeval is cached from the libevent callback until this timer is scheduled. For second_elapsed_callback() all the way to channelpadding_decide_to_pad_channel(), this delta could be very large.
I am still not sure that the fix in #18907 (moved) for "monotonic" time actually prevents stalling. Last I heard Nick was going to write tests to verify this. Did that happen?

I do not believe that this code can be used by the netflow code until these issues are addressed. You can merge it if you want, but I'm not willing to use it.

Trac:
Resolution: implemented to N/A
Status: closed to reopened

I just merged commit 9e44273a4 which fixes 'make dist'.

Somebody should sanity-check that this was a reasonable plan, and that there isn't anything else that wants fixing, and then re-close the ticket.

lgtm

Trac:
Status: reopened to closed
Resolution: N/A to implemented

Trac:
Points: medium to 3

closed

changed time estimate to 24h

mentioned in issue #18907 (moved)

moved to tpo/core/tor#18365 (closed)

Fined-grain timer implementation to support per-connection or per-circuit timers

Child items 0

Activity