Details Matter

clay shentrup
6 min readJan 1, 2020

A few years back I was talking on Slack on to my coworker we’ll call Jake. Because I saw some code where he was using relative time in his spec, instead of hardcoded values combined with RSpec’s travel_to. An excerpt of our discussion follows, with human names and code/variable names heavily altered to respect privacy and intellectual property.

09:48 Clay Shentrup Good morning sir. I just have a small request if you’d be amenable.
09:49 Jake Volta sure (edited)
09:49 Clay Shentrup Not using dynamic time in tests.
e.g. let(:ssu_date) { 10.days.ago.utc }
I just saw that. Could have been a refactor. If so disregard.
09:49 Jake Volta okay, I don’t remember that do you have a link?
09:51 Clay Shentrup [redacted GitHub URL] 🙏🏻
09:53 Jake Volta hmm, yeah, i think most of those can be fixed dates but a few need to be dynamic
09:53 Clay Shentrup What do you mean?
09:53 Jake Volta

let!(:site_in_custom_length_trial_extension) do
create(
:site,
show_billing_modal: nil,
self_signup_completed_at: ssu_date,
trial_extension_enabled: true,
trial_extension_started_at: 1.day.ago.utc,
trial_extension_duration_days: 21,
)
end

09:54 Clay Shentrup fixed_time — 1.day
I generally use let(:now) to refer to the conception of the present time. Then now — 1.day etc. I guess “dynamic” is an ambiguous word. I just mean they’re always the same. So you never get a weird test that starts failing one day, or freaks on on leap years.
09:55 Jake Volta it needs to be in the middle of an extended trial
09:55 Clay Shentrup Right. So relative to some fixed time works, right?
09:55 Jake Volta no
09:55 Clay Shentrup How so?
09:56 Jake Volta trial has a start date and a number of days
09:56 Clay Shentrup Right. I’m sure you have to use travel_to as well, to stub the current time.
09:57 Jake Volta yeah
the queries use now(), though (edited)
09:57 Clay Shentrup Yeah, you may recall that I admonished the team not to do that some months back. Instead use something like :now that you can feed in from Ruby, like Time.current
09:58 Jake Volta hmm, yeah, i suppose that would be doable
09:58 Clay Shentrup .where(LONG_INTERVAL_QUERY, interval: interval, now: Time.current)
Several of our flaky specs that I fixed that one weekend were caused by not doing this.
In any case, thanks for your consideration.
10:00 Jake Volta sure thing
10:00 Clay Shentrup I should probably just do a lunch and learn on time stuff specifically. It’s a good thing to periodically revisit with a larger team that’s evolving.
10:02 Jake Volta yeah tbh I always find myself struggling a little with how to parse absolute times
like what is the expected string format, etc
10:03 Clay Shentrup Fair. I know some people like Data.civil
I generally do this: let(:now) { Time.zone.parse(‘2016–07–08 17:28:09 -0700’) }
10:03 Jake Volta so much easier to do let(:now) { Time.now.utc } or similar but i understand your point

I took this as something roughly like, “Yeah, you’re right but…mmk, doesn’t seem that important…but I hear you.”

So in the morning, two days later, I stumble across this discussion in our deploy channel…

Staging Failed Push App Push failed — jvolta’s push to RELEASE_201X–06–07_10–35
Jake Volta ah dang, got an rspec failure:

rspec /spec/models/redacted_name_job_spec.rb:82 # RedactedJob Locking does not allow stealing if locked less than DEFAULT_UNLOCK_MINUTES ago

10:57 Jed Shen I’ve had that before (edited)
10:57 David Shen jed shen had a similar issue. its wierd since it looks a bit flaky
10:57 Jake Volta failed for me again in isolation
10:58 Sean Graham 🏆 passes for me on master
10:58 David Shen we changed postgresql.conf briefly locally to set the timezone to UTC, and it worked, but then we changed it back, and it still worked, so it was inconclusive (edited)
10:58 Jed Shen this is a red herring and I know it but this is what I did and it fixed it. Went to my postgresql.conf file and updated the timezone to UTC and then changed it back and it started passing also it could be because I ran it after noon it failed up till noon and then started passing after
10:58 Jake Volta hmm
10:58 Jed Shen ¯\_(ツ)_/¯
i really wasnt sure why it started passing
10:59 Jake Volta definitely some time-based stuff here
11:07 Jake Volta well, this is confusing
if I comment out assert(!swj.acquire_lock?(11))oh never mind, inverted a conditional in my reading
11:10 David Shen ohh the other thing I remember might work as well is if you change the delta from 30 seconds to may be 1 min. i think that had worked for Jed. but i don’t know
how any kind of clock skew comes into play here. Curious if that works for you. (edited)
11:11 Jake Volta already went to 60 seconds, still failing

90 passed🤷‍♂
61 passes

very consistent on my laptop right now
11:13 David Shen hmm. i was also able to pass it locally with 30.
all tests in that file and that single test in isolation (edited)
11:16 Jake Volta i could blow away my local DB and start from scratch
11:18 David Shen not exactly sure what the issue is. for this specific test, it doesn’t matter if the delta is 30 or more, and it seems like a higher delta has worked consistently for more ppl, so we could PR an increase for that in the interest of the push?
11:19 Jake Volta really weird to me that 61 seconds is okay, but 60 is not
i’m restarting postgres
11:20 David Shen sure, its potentially also worth a shot to clear the db and start over (edited)
i wonder if the times are hard coded if that would help
11:22 Jake Volta it might
would be a little bit involved right now
nothing crazy, though
11:22 David Shen i can give you the hard coded values
11:23 Jake Volta oh i thought you meant changing the test + code not to use relative times?
11:24 David Shen yes, change the values of now and :locked_at
now = Time.zone.parse(‘2018–06–07 18:23:32 UTC’)
:locked_at, Time.zone.parse(‘2018–06–07 18:18:02 UTC’).
11:28 Jake Volta gotcha. recreating my DBs now. will modify the test and submit that in the PR if that does not work
all of the code uses relative times, though, so i’m not certain that changing just the test times will work
11:29 David Shen yea, it should have worked, but maybe this will provide some more clues.
11:30 Jed Shen can the test be updated to use the travel_to so we dont run into this?
11:30 Jake Volta like if you make the locked_at date 2017, it seems like the code would allow the lock to be stolen
11:30 David Shen ohh the values i provided have the same 30 second difference (edited)
11:30 Jake Volta jed shen that would be the broader fix, but that alone wouldn’t change this case
because the time checking is done in the postgres, not ruby
11:31 Clay Shentrup jake volta Weren’t we just talking about time?
Time is my favorite.
11:31 Jake Volta so we’d need to change the code to inject times from ruby
@clay indeed i was thinking of you here!
11:31 Clay Shentrup
11:31 Jake Volta exactly what you are warning us against, holding up a deploy
11:31 Clay Shentrup The Active Record methods make it pretty convenient to inject ruby values.
And then you use travel_to around the code that needs to operate in fake time in your test.
11:32 Jake Volta yep
11:32 David Shen yea, this is also one of the changes i had to do to get some of the tests to work in the CI
11:32 Jake Volta that’s what we should probably do with this
11:32 Clay Shentrup Advanced testing techniques for the win.
11:34 James Volta databases recreated, and the test still fails
11:34 David Shen have you tried the hardcoded values yet? (edited)
11:35 Jake Volta yeah still fails
because the hardcoded value is always gonna be more than 5 minutes ago, which allows the lock to be stolen
11:38 David Shen ohh i think this is the right value for locked_at ‘201X–06–07 18:19:02 UTC’ … which is less than 5 min. ago, which is the test
11:38 Jake Volta but that’s a moving target
11:38 David Shen still using the 30 sec delta

Conclusion

While it can be frustrating to have one’s wisdom dismissed, it is just human nature to be skeptical of ideas until we’ve gone through the experience ourselves. That said, I suspect that same failure in thinking/observation is at the heart of why many people dismiss agile/XP principles like TDD. They keep experiencing the costs without linking them to the actions they took. Perhaps we can be better practitioners by being mindful when these accidents occur, and trying to connect them to preventative steps in the moment, rather than trying to teach too many principles in an abstract seemingly theoretical exercise.

--

--

clay shentrup

advocate of score voting and approval voting. software engineer.