Trading Fish2024-02-06T06:08:55-05:00https://hector.devHector Castrohector@castro.ioParsing Data in Rust with Nom2022-12-23T00:00:00-05:00https://hector.dev/2022/12/23/parsing-data-in-rust-with-nom<p>This is my third year participating in <a href="https://adventofcode.com" target="_blank" rel="noopener">Advent of Code</a>, but the first using Rust! Since I’m new to the Rust ecosystem, I’ve been dependent on others to steer my third-party library selections. As an example, <a href="https://adventofcode.com/2022/day/15" target="_blank" rel="noopener">Day 15</a> (like most days) presented some interesting string parsing requirements. Luckily, I was guided toward an excellent parser combinator library, affectionately named <a href="https://docs.rs/nom/latest/nom/" target="_blank" rel="noopener">nom</a>, via Chris Biscardi<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>
<h2 id="beacon-exclusion-zone">Beacon exclusion zone</h2>
<p>The Day 15 challenge requires you to track sensors, beacons, and their coordinates. The raw input for this looks like:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Sensor at x=2, y=18: closest beacon is at x=-2, y=15
Sensor at x=9, y=16: closest beacon is at x=10, y=16
Sensor at x=13, y=2: closest beacon is at x=15, y=3
Sensor at x=12, y=14: closest beacon is at x=10, y=16
Sensor at x=10, y=20: closest beacon is at x=10, y=16
Sensor at x=14, y=17: closest beacon is at x=10, y=16
Sensor at x=8, y=7: closest beacon is at x=2, y=10
Sensor at x=2, y=0: closest beacon is at x=2, y=10
Sensor at x=0, y=11: closest beacon is at x=2, y=10
Sensor at x=20, y=14: closest beacon is at x=25, y=17
Sensor at x=17, y=20: closest beacon is at x=21, y=22
Sensor at x=16, y=7: closest beacon is at x=15, y=3
Sensor at x=14, y=3: closest beacon is at x=15, y=3
Sensor at x=20, y=1: closest beacon is at x=15, y=3
</code></pre></div></div>
<p>While this text is parsable with regular expressions, or a combination of well-placed string splits, using a parsing library helps break things down in a composable way (which can sometimes be beneficial for part 2 challenges).</p>
<p>Presuming we have structs for <code class="language-plaintext highlighter-rouge">Sensor</code> and <code class="language-plaintext highlighter-rouge">Beacon</code> that look like the ones below, we can start building out the parsing logic.</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">Sensor</span> <span class="p">{</span>
<span class="n">x</span><span class="p">:</span> <span class="nb">i64</span><span class="p">,</span>
<span class="n">y</span><span class="p">:</span> <span class="nb">i64</span><span class="p">,</span>
<span class="p">}</span>
<span class="k">struct</span> <span class="n">Beacon</span> <span class="p">{</span>
<span class="n">x</span><span class="p">:</span> <span class="nb">i64</span><span class="p">,</span>
<span class="n">y</span><span class="p">:</span> <span class="nb">i64</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>
<h2 id="parsing-with-nom">Parsing with Nom</h2>
<p>First, we’ll parse out each line of input, along with the part of the line relevant to either a <code class="language-plaintext highlighter-rouge">Sensor</code> or a <code class="language-plaintext highlighter-rouge">Beason</code>. Second, we’ll parse out the coordinates and populate them into instances of <code class="language-plaintext highlighter-rouge">Sensor</code> and <code class="language-plaintext highlighter-rouge">Beacon</code>.</p>
<p>For the first part, everything is contained in a function that takes the raw input as a string slice (<code class="language-plaintext highlighter-rouge">&str</code>) and returns an <code class="language-plaintext highlighter-rouge">IResult</code>. An <code class="language-plaintext highlighter-rouge">IResult</code> is a container for the result of a <code class="language-plaintext highlighter-rouge">nom</code> parsing function. The string slice component of an <code class="language-plaintext highlighter-rouge">IResult</code> is the remaining unparsed input, and the <code class="language-plaintext highlighter-rouge">Vec(Sensor, Beacon)</code> is our expected parsing result.</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">map</span><span class="p">(</span><span class="n">input</span><span class="p">:</span> <span class="o">&</span><span class="nb">str</span><span class="p">)</span> <span class="k">-></span> <span class="n">IResult</span><span class="o"><&</span><span class="nb">str</span><span class="p">,</span> <span class="nb">Vec</span><span class="o"><</span><span class="p">(</span><span class="n">Sensor</span><span class="p">,</span> <span class="n">Beacon</span><span class="p">)</span><span class="o">>></span> <span class="p">{</span>
<span class="k">let</span> <span class="p">(</span><span class="n">input</span><span class="p">,</span> <span class="n">reports</span><span class="p">)</span> <span class="o">=</span> <span class="nf">separated_list1</span><span class="p">(</span>
<span class="n">line_ending</span><span class="p">,</span>
<span class="nf">preceded</span><span class="p">(</span>
<span class="nf">tag</span><span class="p">(</span><span class="s">"Sensor at "</span><span class="p">),</span>
<span class="nf">separated_pair</span><span class="p">(</span>
<span class="n">position</span><span class="nf">.map</span><span class="p">(|(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)|</span> <span class="n">Sensor</span> <span class="p">{</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="p">}),</span>
<span class="nf">tag</span><span class="p">(</span><span class="s">": closest beacon is at "</span><span class="p">),</span>
<span class="n">position</span><span class="nf">.map</span><span class="p">(|(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)|</span> <span class="n">Beacon</span> <span class="p">{</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="p">}),</span>
<span class="p">),</span>
<span class="p">),</span>
<span class="p">)(</span><span class="n">input</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
<span class="nf">Ok</span><span class="p">((</span><span class="n">input</span><span class="p">,</span> <span class="n">reports</span><span class="p">))</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Inside the <code class="language-plaintext highlighter-rouge">map</code> function, we start off with <code class="language-plaintext highlighter-rouge">separated_list1</code>, which helps us break up the input into lines. The first argument is <code class="language-plaintext highlighter-rouge">line_ending</code>, which matches line endings of both the <code class="language-plaintext highlighter-rouge">\n</code> and <code class="language-plaintext highlighter-rouge">\r\n</code> variety. The second argument starts with <code class="language-plaintext highlighter-rouge">preceded</code>, which isolates everything after the <code class="language-plaintext highlighter-rouge">Sensor at</code> tag in the line and supplies it to <code class="language-plaintext highlighter-rouge">separated_pair</code>. <code class="language-plaintext highlighter-rouge">separated_pair</code> in turn helps parse out what is on either side of the <code class="language-plaintext highlighter-rouge">: closest beacon is at</code> tag. In this case, those are the coordinate pairs for <code class="language-plaintext highlighter-rouge">Sensor</code> and <code class="language-plaintext highlighter-rouge">Beacon</code>, respectively. To parse them, we’ll define another function called <code class="language-plaintext highlighter-rouge">position</code>.</p>
<p>The <code class="language-plaintext highlighter-rouge">position</code> function helps extract the values of coordinate pairs. As you can see, it has similar arguments to <code class="language-plaintext highlighter-rouge">map</code>, and an <code class="language-plaintext highlighter-rouge">IResult</code> return value. However, the types in the <code class="language-plaintext highlighter-rouge">IResult</code> are a bit different here. The second argument is a tuple, for the <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> coordinates, both <code class="language-plaintext highlighter-rouge">i64</code>.</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">position</span><span class="p">(</span><span class="n">input</span><span class="p">:</span> <span class="o">&</span><span class="nb">str</span><span class="p">)</span> <span class="k">-></span> <span class="n">IResult</span><span class="o"><&</span><span class="nb">str</span><span class="p">,</span> <span class="p">(</span><span class="nb">i64</span><span class="p">,</span> <span class="nb">i64</span><span class="p">)</span><span class="o">></span> <span class="p">{</span>
<span class="nf">separated_pair</span><span class="p">(</span>
<span class="nf">preceded</span><span class="p">(</span><span class="nf">tag</span><span class="p">(</span><span class="s">"x="</span><span class="p">),</span> <span class="nn">complete</span><span class="p">::</span><span class="nb">i64</span><span class="p">),</span>
<span class="nf">tag</span><span class="p">(</span><span class="s">", "</span><span class="p">),</span>
<span class="nf">preceded</span><span class="p">(</span><span class="nf">tag</span><span class="p">(</span><span class="s">"y="</span><span class="p">),</span> <span class="nn">complete</span><span class="p">::</span><span class="nb">i64</span><span class="p">),</span>
<span class="p">)(</span><span class="n">input</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Right away, we jump into <code class="language-plaintext highlighter-rouge">separated_pair</code> again. This parses out both sides of the <code class="language-plaintext highlighter-rouge">,</code>, while <code class="language-plaintext highlighter-rouge">preceded</code> isolates the value after either <code class="language-plaintext highlighter-rouge">x=</code> or <code class="language-plaintext highlighter-rouge">y=</code>. The second argument of <code class="language-plaintext highlighter-rouge">preceded</code> is another parsing function—a <code class="language-plaintext highlighter-rouge">character::complete::i64</code>, which matches the coordinate integer value.</p>
<p>Going back to the <code class="language-plaintext highlighter-rouge">map</code> function, we (somewhat confusingly) call the <code class="language-plaintext highlighter-rouge">map</code> method on the <code class="language-plaintext highlighter-rouge">position</code> parsing result to get the parsed values. That allows us to destructure the tuple and use the values to construct the <code class="language-plaintext highlighter-rouge">Sensor</code> and <code class="language-plaintext highlighter-rouge">Beacon</code> struct literals.</p>
<p>Now, if we use the <code class="language-plaintext highlighter-rouge">dbg!</code> macro on the result of a call to <code class="language-plaintext highlighter-rouge">map</code> with test input, we should see something like:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">map</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">(</span>
<span class="n">Sensor</span> <span class="p">{</span>
<span class="n">x</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span>
<span class="n">y</span><span class="p">:</span> <span class="mi">18</span><span class="p">,</span>
<span class="p">},</span>
<span class="n">Beacon</span> <span class="p">{</span>
<span class="n">x</span><span class="p">:</span> <span class="o">-</span><span class="mi">2</span><span class="p">,</span>
<span class="n">y</span><span class="p">:</span> <span class="mi">15</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">),</span>
<span class="p">(</span>
<span class="n">Sensor</span> <span class="p">{</span>
<span class="n">x</span><span class="p">:</span> <span class="mi">9</span><span class="p">,</span>
<span class="n">y</span><span class="p">:</span> <span class="mi">16</span><span class="p">,</span>
<span class="p">},</span>
<span class="n">Beacon</span> <span class="p">{</span>
<span class="n">x</span><span class="p">:</span> <span class="mi">10</span><span class="p">,</span>
<span class="n">y</span><span class="p">:</span> <span class="mi">16</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">),</span>
<span class="c1">// . . .</span>
<span class="p">]</span>
</code></pre></div></div>
<p>Look at that beautifully structured data!</p>
<h2 id="conclusion">Conclusion</h2>
<p>Reasonably painless, and composable—that’s parsing data with Rust and Nom! If you’re interested in taking a closer look at Nom, be sure to check out this handy, but somewhat hidden, <a href="https://github.com/Geal/nom/blob/main/doc/choosing_a_combinator.md" target="_blank" rel="noopener">list</a> of its available parsers and combinators.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>I highly recommend checking out Chris’ phenomenal Advent of Code solution <a href="https://www.youtube.com/playlist?list=PLWtPciJ1UMuBNTifxm5ADY65SkAdwoQiL" target="_blank" rel="noopener">videos</a>. I could not have dreamt of a better resource to get up-to-speed quickly, with Rust. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Every Other Friday Off Work Schedule2022-06-27T00:00:00-04:00https://hector.dev/2022/06/27/every-other-friday-off-work-schedule<p>For the last six months, I’ve adopted a work schedule where you tally up extra hours in the first nine days of a two-week range, and take the second Friday off (also known as a 9/80 work schedule<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>). Below is a diagram that illustrates one way to do it.</p>
<p><img src="/assets/resized/example-alternate-work-week-schedule-800x492.png" alt="Alternate Work Week Schedule Example" srcset="/assets/resized/example-alternate-work-week-schedule-320x197.png 320w,/assets/resized/example-alternate-work-week-schedule-480x295.png 480w,/assets/resized/example-alternate-work-week-schedule-800x492.png 800w, /images/2022-06-27-every-other-friday-off-work-schedule/example-alternate-work-week-schedule.png 1404w" /></p>
<p>When I first adopted this work schedule, I didn’t think I’d value it as much as my peers. For better or worse (increasingly, worse), I’ve come from a long line of work environments where long hours were rewarded. Every other Friday off? Sure, that’s nice—but that’s not for <em>leaders</em>.</p>
<p>However, after a month of practice, it <em>easily</em> became one of the best work schedule arrangements I’ve ever participated in. Below are some supporting reasons why (from the perspective of an employee—me). If you consider yourself a forward-thinking leader, and have the authority to implement an alternative work schedule like this for your teams, please give it some serious consideration.</p>
<hr />
<h2 id="relief-in-knowing-others-are-off-too">Relief in knowing others are off too</h2>
<p>This may be a side effect of the previous work cultures I referenced above, but for me, I enjoy days off <em>much more</em> when I know other members of the team are off too. The probability of having your day interrupted by a chat DM goes down significantly. There is also a reduced concern of missing previously scheduled meetings or timely emails.</p>
<h2 id="more-opportunity-for-decompression">More opportunity for decompression</h2>
<p>The older I get, the more responsibilities I assume outside of work. Increasingly, the weekend is less a block of time to unwind before the next work week, and more <em>the only</em> opportunity to meet personal obligations that require more time than weekday evenings afford.</p>
<p>Maybe it’s cleaning out the garage, or rearranging the home office, or picking out a nice gift for a loved one. Whatever it is, now there is a whole additional day every two weeks to get it done. That leaves the traditional weekend days with more of an opportunity for much-needed decompression.</p>
<h2 id="more-quality-time-with-my-kid">More quality time with my kid</h2>
<p>This is a bit biased toward folks with younger kids (i.e., not yet in school, or not yet in a <em>serious</em> grade), but the time I spend with my kid on these Fridays off is much higher quality than that of a usual weekend.</p>
<p>When we plan to go somewhere, like the aquarium or a museum, it is much easier to get tickets. There are also generally a lot less people around (:heart: introverts). It’s enabled us to comfortably enjoy experiences like bowling, mini golfing, and wizarding<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> during a pandemic.</p>
<h2 id="conclusion">Conclusion</h2>
<p>A 9/80-style work schedule is personally, very compelling. It unlocks a level of work/life balance I haven’t experienced in a work setting since I started working from home.</p>
<p>It may not be for everyone, though. Work schedule changes require a base level of individual and team-level maturity. But, if that’s present, most software development teams should be able to adopt a work schedule change like this and not miss a beat.</p>
<p>In 2022, hiring for software-focused roles is tough and differentiators are hard to come by. An alternative work schedule that improves work/life balance and doesn’t compromise the business can go a long way on the recruiting front.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p><a href="https://toggl.com/track/9-80-work-schedule/" target="_blank" rel="noopener">What is a 9/80 work schedule?</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p><a href="https://www.harrypotterexhibition.com" target="_blank" rel="noopener">Harry Potter: The Exhibition</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Turning Lemons into Topologically Sorted Lemonade2021-04-09T00:00:00-04:00https://hector.dev/2021/04/09/turning-lemons-into-topologically-sorted-lemonade<p>In a recent interview, I was asked to pair on a coding problem. Like most live coding exercises, I didn’t do very well. So, in an effort to redeem myself (in the eyes of myself), I studied up on the problem and worked through several solutions.</p>
<p>Hopefully, you don’t find yourself in a similar situation. But, if you do, I hope reading through these solutions helps you fair better than I did!</p>
<h2 id="courses-and-prerequisites">Courses and prerequisites</h2>
<p>Without further ado, the pairing exercise problem statement:</p>
<blockquote>
<p>Given a set of courses and a corresponding set prerequisites, produce a valid ordering of courses such that the courses can be taken in that order without bypassing any of the prerequisites (there are multiple correct solutions).</p>
</blockquote>
<p>And, using a Python dictionary, the input data:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">COURSES</span> <span class="o">=</span> <span class="p">{</span>
<span class="sh">"</span><span class="s">Algebra 1</span><span class="sh">"</span><span class="p">:</span> <span class="p">[],</span>
<span class="sh">"</span><span class="s">Algebra 2</span><span class="sh">"</span><span class="p">:</span> <span class="p">[</span><span class="sh">"</span><span class="s">Algebra 1</span><span class="sh">"</span><span class="p">],</span>
<span class="sh">"</span><span class="s">English 1</span><span class="sh">"</span><span class="p">:</span> <span class="p">[],</span>
<span class="sh">"</span><span class="s">English 2</span><span class="sh">"</span><span class="p">:</span> <span class="p">[</span><span class="sh">"</span><span class="s">English 1</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">History 1</span><span class="sh">"</span><span class="p">],</span>
<span class="sh">"</span><span class="s">English 3</span><span class="sh">"</span><span class="p">:</span> <span class="p">[</span><span class="sh">"</span><span class="s">English 2</span><span class="sh">"</span><span class="p">],</span>
<span class="sh">"</span><span class="s">English 4</span><span class="sh">"</span><span class="p">:</span> <span class="p">[</span><span class="sh">"</span><span class="s">English 3</span><span class="sh">"</span><span class="p">],</span>
<span class="sh">"</span><span class="s">History 1</span><span class="sh">"</span><span class="p">:</span> <span class="p">[],</span>
<span class="sh">"</span><span class="s">History 2</span><span class="sh">"</span><span class="p">:</span> <span class="p">[</span><span class="sh">"</span><span class="s">History 1</span><span class="sh">"</span><span class="p">],</span>
<span class="sh">"</span><span class="s">Pre-Calculus</span><span class="sh">"</span><span class="p">:</span> <span class="p">[</span><span class="sh">"</span><span class="s">Algebra 2</span><span class="sh">"</span><span class="p">],</span>
<span class="sh">"</span><span class="s">Statistics 1</span><span class="sh">"</span><span class="p">:</span> <span class="p">[</span><span class="sh">"</span><span class="s">Algebra 1</span><span class="sh">"</span><span class="p">],</span>
<span class="sh">"</span><span class="s">Statistics 2</span><span class="sh">"</span><span class="p">:</span> <span class="p">[</span><span class="sh">"</span><span class="s">Statistics 1</span><span class="sh">"</span><span class="p">],</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Given the input above, a valid ordering of courses is:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="sh">'</span><span class="s">History 1</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">History 2</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">English 1</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">English 2</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">English 3</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">English 4</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">Algebra 1</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">Statistics 1</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">Statistics 2</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">Algebra 2</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">Pre-Calculus</span><span class="sh">'</span><span class="p">]</span>
</code></pre></div></div>
<p>Now that we have the problem defined, let’s look at some solutions!</p>
<h2 id="sorting-out-the-correct-terminology">Sorting out the correct terminology</h2>
<p>I had a sense that the solution to this problem involved modeling the data with a graph data structure, but I wasn’t sure what to do with it after that.
So, I started looking for graph related libraries in Python, which led me to the most excellent <a href="https://networkx.org/" target="_blank" rel="noopener">NetworkX</a>.</p>
<p>After navigating the NetworkX <a href="https://networkx.org/documentation/stable/reference/index.html" target="_blank" rel="noopener">API documentation</a> a bit, I noticed an entire section dedicated to <a href="https://networkx.org/documentation/stable/reference/algorithms/index.html" target="_blank" rel="noopener">algorithms</a>. Under the algorithms section was a subsection specific to <a href="https://networkx.org/documentation/stable/reference/algorithms/dag.html" target="_blank" rel="noopener">Directed Acyclic Graphs (DAGs)</a>. The reference to DAGs caught my eye because DAGs are often used to model data processing workflows with complex <em>dependencies</em>. In the problem statement above, the course <em>prerequisites</em> are a lot like data processing workflow <em>dependencies</em>.</p>
<p>Continuing through the DAG related algorithms, the description for <code class="language-plaintext highlighter-rouge">topological_sort(G)</code> stood out:</p>
<blockquote>
<p>A topological sort is a nonunique permutation of the nodes such that an edge from <code class="language-plaintext highlighter-rouge">u</code> to <code class="language-plaintext highlighter-rouge">v</code> implies that <code class="language-plaintext highlighter-rouge">u</code> appears before <code class="language-plaintext highlighter-rouge">v</code> in the topological sort order.</p>
</blockquote>
<p>That sounds promising! An edge can be produced by connecting a course <code class="language-plaintext highlighter-rouge">v</code> to a prerequisite <code class="language-plaintext highlighter-rouge">u</code>. If a topological sort can help ensure <code class="language-plaintext highlighter-rouge">u</code> appears before <code class="language-plaintext highlighter-rouge">v</code> in an ordering, then it aligns with our goal. Let’s give it a spin!</p>
<h2 id="networkx-to-the-rescue">NetworkX to the rescue</h2>
<p>Following the guidance in the <code class="language-plaintext highlighter-rouge">topological_sort(G)</code> description, I iterated over each combination of course and prerequisite and created an edge between them with the <code class="language-plaintext highlighter-rouge">add_edge</code> method:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="kn">import</span> <span class="n">networkx</span> <span class="k">as</span> <span class="n">nx</span>
<span class="o">>>></span> <span class="n">graph</span> <span class="o">=</span> <span class="n">nx</span><span class="p">.</span><span class="nc">DiGraph</span><span class="p">()</span>
<span class="o">>>></span> <span class="k">for</span> <span class="n">course</span><span class="p">,</span> <span class="n">prerequisites</span> <span class="ow">in</span> <span class="n">COURSES</span><span class="p">.</span><span class="nf">items</span><span class="p">():</span>
<span class="p">...</span> <span class="k">for</span> <span class="n">prerequisite</span> <span class="ow">in</span> <span class="n">prerequisites</span><span class="p">:</span>
<span class="p">...</span> <span class="n">graph</span><span class="p">.</span><span class="nf">add_edge</span><span class="p">(</span><span class="n">prerequisite</span><span class="p">,</span> <span class="n">course</span><span class="p">)</span>
<span class="p">...</span>
<span class="o">>>></span>
</code></pre></div></div>
<p>From there, the only thing left to do was to call <code class="language-plaintext highlighter-rouge">topological_sort(G)</code> with the <code class="language-plaintext highlighter-rouge">DiGraph</code> as an argument:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">pprint</span><span class="p">.</span><span class="nf">pprint</span><span class="p">(</span><span class="nf">list</span><span class="p">(</span><span class="n">nx</span><span class="p">.</span><span class="nf">topological_sort</span><span class="p">(</span><span class="n">graph</span><span class="p">)))</span>
<span class="p">[</span><span class="sh">'</span><span class="s">History 1</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">History 2</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">English 1</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">English 2</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">English 3</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">English 4</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">Algebra 1</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">Statistics 1</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">Statistics 2</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">Algebra 2</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">Pre-Calculus</span><span class="sh">'</span><span class="p">]</span>
<span class="o">>>></span>
</code></pre></div></div>
<p>That ordering looks valid to me!</p>
<h2 id="the-standard-library-can-do-it-too">The standard library can do it too</h2>
<p>After identifying the class of algorithm necessary to solve the problem (i.e., topological sort), I began to use the term in searches for other types of solutions. Eventually, that led me to a module in the Python standard library called <code class="language-plaintext highlighter-rouge">graphlib</code>.</p>
<p>According to the Python <a href="https://github.com/python/cpython/commit/99e6c260d60655f3d2885af545cbc220b808d492" target="_blank" rel="noopener">commit history</a>, <code class="language-plaintext highlighter-rouge">graphlib</code> is pretty new (added in Python 3.9). It promises to provide a set of functionality for operating on graph-like structures. But, right now it only has one class worth of functionality. Luckily for us, that one class is called <code class="language-plaintext highlighter-rouge">TopologicalSorter</code>!</p>
<p>Instantiating the class takes an argument, <code class="language-plaintext highlighter-rouge">graph</code>, which:</p>
<blockquote>
<p>…must be a dictionary representing a directed acyclic graph where the keys are nodes and the values are iterables of all predecessors of that node in the graph (the nodes that have edges that point to the value in the key).</p>
</blockquote>
<p>Hm. That sounds very similar to the <code class="language-plaintext highlighter-rouge">COURSES</code> data structure defined above. Let’s pass it through in the Python interpreter, and then call the <code class="language-plaintext highlighter-rouge">static_order</code> method:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="kn">from</span> <span class="n">graphlib</span> <span class="kn">import</span> <span class="n">TopologicalSorter</span>
<span class="o">>>></span> <span class="n">pprint</span><span class="p">.</span><span class="nf">pprint</span><span class="p">(</span><span class="nf">list</span><span class="p">(</span><span class="nc">TopologicalSorter</span><span class="p">(</span><span class="n">COURSES</span><span class="p">).</span><span class="nf">static_order</span><span class="p">()))</span>
<span class="p">[</span><span class="sh">'</span><span class="s">Algebra 1</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">English 1</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">History 1</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">Algebra 2</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">Statistics 1</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">English 2</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">History 2</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">Pre-Calculus</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">Statistics 2</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">English 3</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">English 4</span><span class="sh">'</span><span class="p">]</span>
<span class="o">>>></span>
</code></pre></div></div>
<p>Whoa! A different ordering from the one above, but still valid. We can even use another NetworkX function to confirm the <code class="language-plaintext highlighter-rouge">TopologicalSorter</code> solution is valid by checking it against <em>all</em> possible topological sorts, as reported by <code class="language-plaintext highlighter-rouge">all_topological_sorts(G)</code>:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">solution</span> <span class="o">=</span> <span class="nf">list</span><span class="p">(</span><span class="nc">TopologicalSorter</span><span class="p">(</span><span class="n">COURSES</span><span class="p">).</span><span class="nf">static_order</span><span class="p">())</span>
<span class="o">>>></span> <span class="n">solution</span> <span class="ow">in</span> <span class="n">nx</span><span class="p">.</span><span class="nf">all_topological_sorts</span><span class="p">(</span><span class="n">graph</span><span class="p">)</span>
<span class="bp">True</span>
<span class="o">>>></span>
</code></pre></div></div>
<p>Excellent—looks like we are two-for-two so far!</p>
<h2 id="show-your-work">Show your work</h2>
<p>Using algorithms built-in to libraries like NetworkX and <code class="language-plaintext highlighter-rouge">graphlib</code> is fun and all, but how would we solve this problem algorithmically? Well, according to Wikipedia, there are two existing algorithms to draw from:</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Topological_sorting#Kahn's_algorithm" target="_blank" rel="noopener">Kahn’s algorithm</a></li>
<li><a href="https://en.wikipedia.org/wiki/Topological_sorting#Depth-first_search" target="_blank" rel="noopener">Depth-first search algorithm</a></li>
</ul>
<p>I’m going to focus on Kahn’s algorithm—primarily because it <em>doesn’t</em> involve recursion. Although recursion is a fascinating technique, I don’t see it too often in day-to-day code, and I find that it creates confusion in most engineers (myself included).</p>
<blockquote>
<p><strong>Note</strong>: If the variable names below become confusing, please refer to the pseudocode in the Wikipedia link for Kahn’s algorithm. I’m trying to match the variable names to that, so it is easier to follow along.</p>
</blockquote>
<p>To start things off, let’s use the Python interpreter to define <code class="language-plaintext highlighter-rouge">L</code> and <code class="language-plaintext highlighter-rouge">S</code>. Here, <code class="language-plaintext highlighter-rouge">L</code> is being set up to contain the final course ordering and <code class="language-plaintext highlighter-rouge">S</code> made up of all the nodes in the graph with zero edges pointing to them (e.g., <code class="language-plaintext highlighter-rouge">in_degree == 0</code>):</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">L</span> <span class="o">=</span> <span class="p">[]</span>
<span class="o">>>></span> <span class="n">S</span> <span class="o">=</span> <span class="p">[</span><span class="n">node</span> <span class="k">for</span> <span class="n">node</span> <span class="ow">in</span> <span class="n">graph</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">graph</span><span class="p">.</span><span class="nf">in_degree</span><span class="p">(</span><span class="n">node</span><span class="p">)]</span>
<span class="o">>>></span>
</code></pre></div></div>
<p>Next, we need to create a <code class="language-plaintext highlighter-rouge">while</code> loop set to run until <code class="language-plaintext highlighter-rouge">S</code> is empty. Inside, we pop <code class="language-plaintext highlighter-rouge">n</code> from <code class="language-plaintext highlighter-rouge">S</code> and immediately append it to <code class="language-plaintext highlighter-rouge">L</code>:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="k">while</span> <span class="n">S</span><span class="p">:</span>
<span class="p">...</span> <span class="n">n</span> <span class="o">=</span> <span class="n">S</span><span class="p">.</span><span class="nf">pop</span><span class="p">()</span>
<span class="p">...</span> <span class="n">L</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
</code></pre></div></div>
<p>After that, we need to identify all nodes connected to <code class="language-plaintext highlighter-rouge">n</code> so that we can remove each edge from the graph, one-by-one. As they’re removed, we check to see if there are any remaining edges pointing to the node <code class="language-plaintext highlighter-rouge">m</code>. If not, append <code class="language-plaintext highlighter-rouge">m</code> to <code class="language-plaintext highlighter-rouge">S</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">...</span> <span class="n">edges</span> <span class="o">=</span> <span class="nf">list</span><span class="p">(</span><span class="n">graph</span><span class="p">.</span><span class="nf">neighbors</span><span class="p">(</span><span class="n">n</span><span class="p">))</span>
<span class="p">...</span> <span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="n">edges</span><span class="p">:</span>
<span class="p">...</span> <span class="n">graph</span><span class="p">.</span><span class="nf">remove_edge</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">m</span><span class="p">)</span>
<span class="p">...</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">graph</span><span class="p">.</span><span class="nf">in_degree</span><span class="p">(</span><span class="n">m</span><span class="p">):</span>
<span class="p">...</span> <span class="n">S</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">m</span><span class="p">)</span>
<span class="o">>>></span>
</code></pre></div></div>
<p>After these steps are complete, <code class="language-plaintext highlighter-rouge">L</code> should contain a valid course ordering. Again, we can confirm with <code class="language-plaintext highlighter-rouge">all_topological_sorts(G)</code>:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">L</span> <span class="ow">in</span> <span class="n">nx</span><span class="p">.</span><span class="nf">all_topological_sorts</span><span class="p">(</span><span class="n">graph</span><span class="p">)</span>
<span class="bp">True</span>
<span class="o">>>></span>
</code></pre></div></div>
<p>That’s it! Now we have three different solutions to determine a valid ordering for a set of courses and prerequisites. Enjoy all the flavors of topologically sorted lemonade! :lemon:</p>
Twelve-Factor Methodology Applied to a Django App2021-03-16T00:00:00-04:00https://hector.dev/2021/03/16/twelve-factor-methodology-applied-to-a-django-app<p>In the past few weeks, I’ve participated in a handful of DevOps/Site Reliability Engineer (SRE) interviews. Several interviewers have asked for guidelines configuring and operating <a href="https://en.wikipedia.org/wiki/Cloud_native_computing" target="_blank" rel="noopener">cloud-native</a> applications. My mind immediately goes to the <a href="https://12factor.net" target="_blank" rel="noopener">Twelve-Factor App methodology</a>, originally created by the folks who built <a href="https://www.heroku.com" target="_blank" rel="noopener">Heroku</a>—one of the first publicly accessible platforms as a service (PaaS).</p>
<p>Combined, the points serve to abstract applications from the infrastructure they run on, paving the way for configurability, scalability, and reliability. To illustrate how this works in practice, I set up a Django application and use it to explain how each 12 Factor point applies. I hope you find it useful!</p>
<ul>
<li><a href="#codebase">Codebase</a></li>
<li><a href="#dependencies">Dependencies</a></li>
<li><a href="#config">Config</a></li>
<li><a href="#backing-services">Backing services</a></li>
<li><a href="#build-release-run">Build, release, run</a></li>
<li><a href="#processes">Processes</a></li>
<li><a href="#port-binding">Port binding</a></li>
<li><a href="#concurrency">Concurrency</a></li>
<li><a href="#disposability">Disposability</a></li>
<li><a href="#devprod-parity">Dev/prod parity</a></li>
<li><a href="#logs">Logs</a></li>
<li><a href="#admin-processes">Admin processes</a></li>
</ul>
<hr />
<blockquote>
<p><strong>Note</strong>: The code snippets in the following sections do not chain together perfectly. The snippets are there primarily to help communicate what’s going on in ways that only code can.</p>
</blockquote>
<h2 id="codebase">Codebase</h2>
<p>A <em>codebase</em> is the complete source material of a given software program or application. Its structure will vary based on technology, but for a Django application called <code class="language-plaintext highlighter-rouge">mysite</code> created with <code class="language-plaintext highlighter-rouge">django-admin startproject</code>, it looks like this:</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>git init
<span class="go">Initialized empty Git repository in /home/hector/Projects/django-blog/.git/
</span><span class="gp">$</span><span class="w"> </span>git add <span class="nb">.</span>
<span class="gp">$</span><span class="w"> </span>git status
<span class="go">On branch master
No commits yet
Changes to be committed:
</span><span class="gp"> (use "git rm --cached <file></span>...<span class="s2">" to unstage)
</span><span class="go"> new file: .gitignore
new file: Pipfile
new file: Pipfile.lock
new file: mysite/manage.py
new file: mysite/mysite/__init__.py
new file: mysite/mysite/asgi.py
new file: mysite/mysite/settings.py
new file: mysite/mysite/urls.py
new file: mysite/mysite/wsgi.py
new file: setup.cfg
</span></code></pre></div></div>
<p>Excellent—we have ourselves a <em>codebase</em>! We’ll gradually cover converting <em>codebases</em> into <em>deploys</em> in the following sections.</p>
<h2 id="dependencies">Dependencies</h2>
<p>Applications have <em>dependencies</em>. 12 Factor wants us to explicitly declare these dependencies so they can be managed in a repeatable way. The first step toward achieving this happens with a <code class="language-plaintext highlighter-rouge">Pipfile</code>. It was created by a Python dependency management tool called <code class="language-plaintext highlighter-rouge">pipenv</code> after the following commands were run:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pipenv install django~=3.1
pipenv install black --dev --pre # --pre is needed because of black's versioning scheme
pipenv install flake8~=3.8 --dev
pipenv install isort~=5.7 --dev
</code></pre></div></div>
<p>The inside of a <code class="language-plaintext highlighter-rouge">Pipfile</code> is written in <a href="https://toml.io/" target="_blank" rel="noopener">Tom’s Obvious Minimal Language (TOML)</a> and contains a manifest of the Python dependencies needed for a project:</p>
<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[[source]]</span>
<span class="py">url</span> <span class="p">=</span> <span class="s">"https://pypi.org/simple"</span>
<span class="py">verify_ssl</span> <span class="p">=</span> <span class="kc">true</span>
<span class="py">name</span> <span class="p">=</span> <span class="s">"pypi"</span>
<span class="nn">[packages]</span>
<span class="py">django</span> <span class="p">=</span> <span class="py">"~</span><span class="p">=</span><span class="mf">3.1</span><span class="s">"</span><span class="err">
</span>
<span class="nn">[dev-packages]</span>
<span class="py">black</span> <span class="p">=</span> <span class="s">"*"</span>
<span class="py">flake8</span> <span class="p">=</span> <span class="py">"~</span><span class="p">=</span><span class="mf">3.8</span><span class="s">"</span><span class="err">
</span><span class="py">isort</span> <span class="p">=</span> <span class="py">"~</span><span class="p">=</span><span class="mf">5.7</span><span class="s">"</span><span class="err">
</span>
<span class="nn">[requires]</span>
<span class="py">python_version</span> <span class="p">=</span> <span class="s">"3.8"</span>
<span class="nn">[pipenv]</span>
<span class="py">allow_prereleases</span> <span class="p">=</span> <span class="kc">true</span>
</code></pre></div></div>
<p>Nowadays, we try to take this a step further by capturing <em>all</em> the necessary application dependencies in a container image. In most cases, the pursuit of creating a container image leads to using <a href="https://www.docker.com/" target="_blank" rel="noopener">Docker</a>, which implies the addition of a <code class="language-plaintext highlighter-rouge">Dockerfile</code>:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> python:3.8</span>
<span class="k">ENV</span><span class="s"> PYTHONUNBUFFERED=1</span>
<span class="k">RUN </span><span class="nb">mkdir</span> <span class="nt">-p</span> /usr/src/app
<span class="k">WORKDIR</span><span class="s"> /usr/src/app</span>
<span class="k">COPY</span><span class="s"> ./Pipfile* .</span>
<span class="k">RUN </span>pip <span class="nb">install </span>pipenv
<span class="k">RUN </span>pipenv <span class="nb">install</span> <span class="nt">--system</span> <span class="nt">--deploy</span> <span class="nt">--ignore-pipfile</span>
<span class="k">COPY</span><span class="s"> ./mysite .</span>
<span class="k">ENTRYPOINT</span><span class="s"> [ "python", "manage.py" ]</span>
</code></pre></div></div>
<p>To make sure things are in working order, we can build and test the container image using the following commands. Here, the <code class="language-plaintext highlighter-rouge">runserver</code> argument launches the Django development server:</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>docker build <span class="nt">-t</span> mysite <span class="nb">.</span>
<span class="gp">$</span><span class="w"> </span>docker run <span class="nt">--rm</span> mysite runserver
<span class="go">Watching for file changes with StatReloader
Performing system checks...
System check identified no issues (0 silenced).
You have 18 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): admin, auth, contenttypes, sessions.
Run 'python manage.py migrate' to apply them.
March 01, 2021 - 20:45:33
Django version 3.1.7, using settings 'mysite.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
</span></code></pre></div></div>
<p>Looks good! We now have everything needed to spin up the application captured in a container image. In addition, we have all the associated instructions to build the image defined in a declarative way (e.g., <code class="language-plaintext highlighter-rouge">Pipfile</code>, <code class="language-plaintext highlighter-rouge">Dockerfile</code>).</p>
<h2 id="config">Config</h2>
<p>In the Twelve-Factor world, <em>configuration</em> is defined as anything that can vary between <em>deploys</em> of a <em>codebase</em>. This allows a single <em>codebase</em> to be deployed into different environments without customization. Some examples of <em>configuration</em> include:</p>
<ul>
<li>Connection strings to the database, Memcached, and other backing services.</li>
<li>Credentials to external services (e.g., Amazon S3, Google Maps, etc.).</li>
<li>Information about the target environment (e.g., <code class="language-plaintext highlighter-rouge">Staging</code> vs. <code class="language-plaintext highlighter-rouge">Production</code>).</li>
</ul>
<p>Once we’ve identified the configuration for our application, we need to work toward making it consumable via <a href="https://en.wikipedia.org/wiki/Environment_variable" target="_blank" rel="noopener">environment variables</a>. In the example below, we focus on changing the way Django’s <code class="language-plaintext highlighter-rouge">SECRET_KEY</code> and <code class="language-plaintext highlighter-rouge">DEBUG</code> settings are set in <code class="language-plaintext highlighter-rouge">settings.py</code> (the home for all Django configuration settings).</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gh">diff --git a/mysite/mysite/settings.py b/mysite/mysite/settings.py
index d541c62..3a99d45 100644
</span><span class="gd">--- a/mysite/mysite/settings.py
</span><span class="gi">+++ b/mysite/mysite/settings.py
</span><span class="p">@@ -9,7 +9,7 @@</span> https://docs.djangoproject.com/en/3.1/topics/settings/
For the full list of settings and their values, see
https://docs.djangoproject.com/en/3.1/ref/settings/
"""
<span class="gd">-
</span><span class="gi">+import os
</span> from pathlib import Path
<span class="err">
</span> # Build paths inside the project like this: BASE_DIR / 'subdir'.
<span class="p">@@ -20,10 +20,10 @@</span> BASE_DIR = Path(__file__).resolve().parent.parent
# See https://docs.djangoproject.com/en/3.1/howto/deployment/checklist/
<span class="err">
</span> # SECURITY WARNING: keep the secret key used in production secret!
<span class="gd">-SECRET_KEY = "#v5hnkypk39qex@9zb2j2as3n9f7)jgvz05*9t&0@2y$kx$7lw"
</span><span class="gi">+SECRET_KEY = os.getenv("DJANGO_SECRET_KEY", "secret")
</span><span class="err">
</span> # SECURITY WARNING: don't run with debug turned on in production!
<span class="gd">-DEBUG = True
</span><span class="gi">+DEBUG = os.getenv("DJANGO_ENV") == "Development"
</span><span class="err">
</span> ALLOWED_HOSTS = []
</code></pre></div></div>
<p>Here, we made use of the Python standard library <code class="language-plaintext highlighter-rouge">os</code> module to help us read configuration from the environment. Now, the two settings can be more easily reconfigured across <em>deploys</em>.</p>
<p>To prove it works, we can change the environment with the <code class="language-plaintext highlighter-rouge">-e</code> flag of <code class="language-plaintext highlighter-rouge">docker run</code>:</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>docker build <span class="nt">-t</span> mysite <span class="nb">.</span>
<span class="gp">$</span><span class="w"> </span>docker run <span class="nt">--rm</span> <span class="se">\</span>
<span class="go"> -e DJANGO_SECRET_KEY="dev-secret" \
-e DJANGO_ENV="Development" \
mysite runserver
Watching for file changes with StatReloader
Performing system checks...
System check identified no issues (0 silenced).
You have 18 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): admin, auth, contenttypes, sessions.
Run 'python manage.py migrate' to apply them.
March 01, 2021 - 21:25:57
Django version 3.1.7, using settings 'mysite.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
^C%
</span></code></pre></div></div>
<p>OK. Everything continued to work the way it was working before. Now, let’s see what happens if we try to make <code class="language-plaintext highlighter-rouge">DJANGO_ENV=Production</code>, which will cause the <code class="language-plaintext highlighter-rouge">DEBUG</code> setting to evaluate to <code class="language-plaintext highlighter-rouge">False</code>:</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>docker run <span class="nt">--rm</span> <span class="se">\</span>
<span class="go"> -e DJANGO_SECRET_KEY="prod-secret" \
-e DJANGO_ENV="Production" \
mysite runserver
CommandError: You must set settings.ALLOWED_HOSTS if DEBUG is False.
</span></code></pre></div></div>
<p>Aha! This <code class="language-plaintext highlighter-rouge">CommandError</code> looks ominous, but it is an indicator that our change of <code class="language-plaintext highlighter-rouge">DJANGO_ENV</code> made its way into the application’s execution environment successfully!</p>
<h2 id="backing-services">Backing services</h2>
<p>A <em>backing service</em> is any service the application consumes over the network as part of its normal operation. Emphasis is placed on minimizing the distinction between local and third-party backing services such that the application can’t tell the difference between them.</p>
<p>As an example, say you have a PostgreSQL database instance running on your workstation that’s connected to your application to persist data. Later, when it comes time to deploy to production, the same approach to configuring the local PostgreSQL instance should work when it gets swapped out for an Amazon Relational Database Service (RDS) instance.</p>
<p>To achieve this with Django, we need to change the way connectivity to the database is configured. That happens via the <code class="language-plaintext highlighter-rouge">DATABASES</code> dictionary in <code class="language-plaintext highlighter-rouge">settings.py</code>:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gh">diff --git a/mysite/mysite/settings.py b/mysite/mysite/settings.py
index 3a99d45..fcff52a 100644
</span><span class="gd">--- a/mysite/mysite/settings.py
</span><span class="gi">+++ b/mysite/mysite/settings.py
</span><span class="p">@@ -75,8 +75,12 @@</span> WSGI_APPLICATION = "mysite.wsgi.application"
<span class="err">
</span> DATABASES = {
"default": {
<span class="gd">- "ENGINE": "django.db.backends.sqlite3",
- "NAME": BASE_DIR / "db.sqlite3",
</span><span class="gi">+ "ENGINE": "django.db.backends.postgresql",
+ "NAME": os.getenv("POSTGRES_DB"),
+ "USER": os.getenv("POSTGRES_USER"),
+ "PASSWORD": os.getenv("POSTGRES_PASSWORD"),
+ "HOST": os.getenv("POSTGRES_HOST"),
+ "PORT": os.getenv("POSTGRES_PORT"),
</span> }
}
</code></pre></div></div>
<p>Here, we modified <code class="language-plaintext highlighter-rouge">DATABASES</code> so that all the necessary settings for the <code class="language-plaintext highlighter-rouge">default</code> database are pulled from the environment. Now, it doesn’t matter if the application is launched with <code class="language-plaintext highlighter-rouge">HOST</code> equal to <code class="language-plaintext highlighter-rouge">localhost</code> or <code class="language-plaintext highlighter-rouge">mysite.123456789012.us-east-1.rds.amazonaws.com</code>. In either case, the application should be able to connect to the database successfully using the settings found in the environment.</p>
<h2 id="build-release-run">Build, release, run</h2>
<p>In the <a href="#dependencies">Dependencies</a> section we produced a <em>build</em> in the form of a container image. But, we also need a unique label to identify and differentiate between versions of the container image. Uniqueness can come in the form of a timestamp, or an incrementing number, but I personally like to use Git revisions. Below is an example that uses the current Git revision to tag a container image:</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span><span class="c"># Get a reference to the latest commit of the current</span>
<span class="gp">$</span><span class="w"> </span><span class="c"># branch and make it short (only 7 characters long).</span>
<span class="gp">$</span><span class="w"> </span><span class="nb">export </span><span class="nv">GIT_COMMIT</span><span class="o">=</span><span class="s2">"</span><span class="si">$(</span>git rev-parse <span class="nt">--short</span> HEAD<span class="si">)</span><span class="s2">"</span>
<span class="gp">$</span><span class="w"> </span>docker build <span class="nt">-t</span> <span class="s2">"mysite:</span><span class="nv">$GIT_COMMIT</span><span class="s2">"</span> <span class="nb">.</span>
<span class="gp">$</span><span class="w"> </span>docker images | <span class="nb">grep </span>mysite
<span class="go">mysite e87b8c4 4f3dc2772c57 2 minutes ago 978MB
</span></code></pre></div></div>
<p>As you can see from the output, the reference <code class="language-plaintext highlighter-rouge">mysite:e87b8c4</code> is unique to the container image we built. If we make additional changes to the <em>codebase</em> and commit them to the underlying Git repository, following these same steps will result in a new container image with a new unique reference.</p>
<p>Next, we need to combine the container image <em>build</em> above with a relevant set of <em>configuration</em> to produce a <em>release</em>. Here, we’ll use a lightweight <a href="https://docs.docker.com/compose/" target="_blank" rel="noopener">Docker Compose</a> configuration file to describe the connection between the two (<em>builds</em> and <em>releases</em>) in a declarative way. In a production system, you’d likely do something similar using a Kubernetes <a href="https://kubernetes.io/docs/concepts/workloads/controllers/deployment/" target="_blank" rel="noopener">deployment</a> or an Amazon ECS <a href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definitions.html" target="_blank" rel="noopener">task definition</a>:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">version</span><span class="pi">:</span> <span class="s2">"</span><span class="s">3"</span>
<span class="na">services</span><span class="pi">:</span>
<span class="na">web</span><span class="pi">:</span>
<span class="na">image</span><span class="pi">:</span> <span class="s">mysite:e87b8c4</span>
<span class="na">environment</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">POSTGRES_HOST=mysite.123456789012.us-east-1.rds.amazonaws.com</span>
<span class="pi">-</span> <span class="s">POSTGRES_PORT=5432</span>
<span class="pi">-</span> <span class="s">POSTGRES_USER=mysite</span>
<span class="pi">-</span> <span class="s">POSTGRES_PASSWORD=mysite</span>
<span class="pi">-</span> <span class="s">POSTGRES_DB=mysite</span>
<span class="pi">-</span> <span class="s">DJANGO_ENV=Staging</span>
<span class="pi">-</span> <span class="s">DJANGO_SECRET_KEY=staging-secret</span>
<span class="na">command</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">runserver</span>
<span class="pi">-</span> <span class="s2">"</span><span class="s">0.0.0.0:8000"</span>
<span class="na">ports</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s2">"</span><span class="s">8000:8000"</span>
</code></pre></div></div>
<p>This bit of Docker Compose configuration ties together the <code class="language-plaintext highlighter-rouge">mysite:e87b8c4</code> <em>build</em> with a set of environment specific <em>configuration</em> to produce a <em>release</em>. If the container image and Docker Compose configuration snippet are available on the same host, then the application is ready for immediate execution on that host.</p>
<p>Lastly, we have the <em>run</em> stage. For Docker Compose, that’s as simple as using <code class="language-plaintext highlighter-rouge">docker-compose up</code> to launch the <code class="language-plaintext highlighter-rouge">web</code> service. For a more sophisticated container orchestration system, several more steps would likely be involved:</p>
<ul>
<li>The container image is published to a centrally accessible container registry.</li>
<li>The deployment manifest is submitted for evaluation to a container scheduler.</li>
<li>Compute is connected to the container scheduler with adequate resources to place instances of the application.</li>
</ul>
<h2 id="processes">Processes</h2>
<p>The Twelve-Factor methodology emphasizes applications as stand-alone <em>processes</em> because when they share nothing, they can be made to more easily scale horizontally. Therefore, striving to store all dynamic state in a <em>backing service</em> (e.g., a database) to make a process stateless is important.</p>
<p>However, sometimes whole components of an application need to be dynamically built, like its associated CSS and JavaScript. To be truly stateless, we want to generate those components during the <em>build</em> phase and capture them in the container image.</p>
<p>Django has <a href="https://docs.djangoproject.com/en/3.1/howto/static-files/deployment/" target="_blank" rel="noopener">several built-in mechanisms</a> to handle static assets, but I prefer to use a third-party library named <a href="http://whitenoise.evans.io/en/stable/index.html" target="_blank" rel="noopener">WhiteNoise</a>. Primarily, because it helps package both the application and its supporting static assets together in a way that enables thinking about a <em>deploy</em> as an atomic operation.</p>
<p>After installing WhiteNoise using <code class="language-plaintext highlighter-rouge">pipenv</code> with a command similar to the one we used in <a href="#dependencies">Dependencies</a> to install Django, we need to configure the Django application to use WhiteNoise for static asset management. Here, we inject WhiteNoise into the Django <code class="language-plaintext highlighter-rouge">INSTALLED_APPS</code> and <code class="language-plaintext highlighter-rouge">MIDDLEWARE</code> hierarchy to take over static asset management in development and non-development environments:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gh">diff --git a/mysite/mysite/settings.py b/mysite/mysite/settings.py
index 216452b..f4e32c6 100644
</span><span class="gd">--- a/mysite/mysite/settings.py
</span><span class="gi">+++ b/mysite/mysite/settings.py
</span><span class="p">@@ -31,6 +31,7 @@</span> ALLOWED_HOSTS = []
# Application definition
<span class="err">
</span> INSTALLED_APPS = [
<span class="gi">+ "whitenoise.runserver_nostatic",
</span> "django.contrib.admin",
"django.contrib.auth",
"django.contrib.contenttypes",
<span class="p">@@ -41,6 +42,7 @@</span> INSTALLED_APPS = [
<span class="err">
</span> MIDDLEWARE = [
"django.middleware.security.SecurityMiddleware",
<span class="gi">+ "whitenoise.middleware.WhiteNoiseMiddleware",
</span> "django.contrib.sessions.middleware.SessionMiddleware",
"django.middleware.common.CommonMiddleware",
"django.middleware.csrf.CsrfViewMiddleware",
<span class="p">@@ -122,3 +124,7 @@</span> USE_TZ = True
# https://docs.djangoproject.com/en/3.1/howto/static-files/
<span class="err">
</span> STATIC_URL = "/static/"
<span class="gi">+
+STATIC_ROOT = "/static"
+
+STATICFILES_STORAGE = "whitenoise.storage.CompressedManifestStaticFilesStorage"
</span></code></pre></div></div>
<p>The two settings at the bottom (<code class="language-plaintext highlighter-rouge">STATIC_ROOT</code> and <code class="language-plaintext highlighter-rouge">STATICFILES_STORAGE</code>) tell Django where to store the collected files on the container image file system and what preprocessing operations to apply.</p>
<p>Next, we need to ensure that Django preprocesses all static assets as part of the container image build process. For Django, that means adding an invocation of the <code class="language-plaintext highlighter-rouge">collectstatic</code> command to the container image build instructions:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gh">diff --git a/Dockerfile b/Dockerfile
index 4653278..6420680 100644
</span><span class="gd">--- a/Dockerfile
</span><span class="gi">+++ b/Dockerfile
</span><span class="p">@@ -10,4 +10,6 @@</span> RUN pip install pipenv
RUN pipenv install --system --deploy --ignore-pipfile
COPY ./mysite .
<span class="gi">+RUN python manage.py collectstatic --no-input
+
</span> ENTRYPOINT [ "python", "manage.py" ]
</code></pre></div></div>
<p>Statelessness achieved!</p>
<h2 id="port-binding">Port binding</h2>
<p>Now that we have the application source code, dependencies, and supporting static assets inside a container image, we need a way to expose the entirety of it in a self-contained way. Since this is a web application, our goal is to use the HTTP protocol instead of lower level APIs like CGI, FastCGI, Servlets, etc.</p>
<p>We’ve seen our application bound to a port over HTTP several times already via the <code class="language-plaintext highlighter-rouge">docker run</code> invocations above, but they were all using a development-grade HTTP application server (e.g., <code class="language-plaintext highlighter-rouge">runserver</code>). How do we achieve something similar in a production-grade way?</p>
<p>Enter <a href="https://gunicorn.org/" target="_blank" rel="noopener">Gunicorn</a> and <a href="https://www.uvicorn.org/" target="_blank" rel="noopener">Uvicorn</a>. Gunicorn is a production-grade Python application server for UNIX based systems, and Uvicorn provides a Gunicorn worker implementation with <a href="https://asgi.readthedocs.io/en/latest/index.html" target="_blank" rel="noopener">Asynchronous Server Gateway Interface (ASGI)</a> compatibility.</p>
<p>After installing Gunicorn and Uvicorn using <code class="language-plaintext highlighter-rouge">pipenv install</code>, we need to tweak the Docker Compose configuration from <a href="#build-release-run">Build, release, run</a> to use Gunicorn as the <code class="language-plaintext highlighter-rouge">entrypoint</code>. We also add a few command-line options to ensure that the ASGI API is used (between Gunicorn and Django) along with the Uvicorn worker implementation:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gh">diff --git a/docker-compose.yml b/docker-compose.yml
index f5f693d..bac885d 100644
</span><span class="gd">--- a/docker-compose.yml
</span><span class="gi">+++ b/docker-compose.yml
</span><span class="p">@@ -20,8 +20,12 @@</span> services:
build:
context: .
dockerfile: Dockerfile
<span class="gi">+ entrypoint: gunicorn
</span> command:
<span class="gd">- - runserver
- - "0.0.0.0:8000"
</span><span class="gi">+ - "mysite.asgi:application"
+ - "-b 0.0.0.0:8000"
+ - "-k uvicorn.workers.UvicornWorker"
</span></code></pre></div></div>
<p>After all of these changes, Docker Compose should be able to bring the service up bound to port <code class="language-plaintext highlighter-rouge">8000</code> using Gunicorn:</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>docker-compose up web
<span class="go">Starting django-blog_web_1 ... done
Attaching to django-blog_web_1
web_1 | [2021-03-06 19:57:43 +0000] [1] [INFO] Starting gunicorn 20.0.4
web_1 | [2021-03-06 19:57:43 +0000] [1] [INFO] Listening at: http://0.0.0.0:8000 (1)
web_1 | [2021-03-06 19:57:43 +0000] [1] [INFO] Using worker: uvicorn.workers.UvicornWorker
web_1 | [2021-03-06 19:57:43 +0000] [8] [INFO] Booting worker with pid: 8
web_1 | [2021-03-06 19:57:43 +0000] [8] [INFO] Started server process [8]
web_1 | [2021-03-06 19:57:43 +0000] [8] [INFO] Waiting for application startup.
web_1 | [2021-03-06 19:57:43 +0000] [8] [INFO] ASGI 'lifespan' protocol appears unsupported.
web_1 | [2021-03-06 19:57:43 +0000] [8] [INFO] Application startup complete.
</span></code></pre></div></div>
<p>We can confirm by creating a second terminal session, hitting the <code class="language-plaintext highlighter-rouge">/admin/</code> endpoint, and inspecting the response:</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>http localhost:8000/admin/
<span class="go">HTTP/1.1 302 Found
cache-control: max-age=0, no-cache, no-store, must-revalidate, private
content-length: 0
content-type: text/html charset=utf-8
date: Sat, 06 Mar 2021 19:59:36 GMT
expires: Sat, 06 Mar 2021 19:59:36 GMT
location: /admin/login/?next=/admin/
referrer-policy: same-origin
server: uvicorn
vary: Cookie
x-content-type-options: nosniff
x-frame-options: DENY
</span></code></pre></div></div>
<p>It’s alive!</p>
<h2 id="concurrency">Concurrency</h2>
<p>As load against an application increases, the ability to address it by quickly and reliably adding more stateless <em>processes</em> is desirable. Gunicorn has built-in support for a process level <a href="https://docs.gunicorn.org/en/stable/design.html" target="_blank" rel="noopener">worker model</a>, but using it to scale an application in cloud based environments can cause contention with higher level distributed process managers. This is because both want to manage the processes, but only the distributed process manager has a wholistic view of resources across machines. Instead, we can set the number of Gunicorn worker processes low and defer process management to a higher level supervisor.</p>
<p>Specifying different <em>process types</em> can’t really be done with Gunicorn either. Usually, that’s more tightly coupled with the container orchestration engine you use. Later on in <a href="#devprod-parity">Dev/prod parity</a> we’ll see a Docker Compose configuration with both a <code class="language-plaintext highlighter-rouge">database</code> and <code class="language-plaintext highlighter-rouge">web</code> process type. Within a more production-oriented container orchestration system like Kubernetes, you’d achieve something similar by creating separate sets of <a href="https://kubernetes.io/docs/concepts/workloads/pods/" target="_blank" rel="noopener">pods</a>—one for each <em>process type</em> to enable independent scaling.</p>
<h2 id="disposability">Disposability</h2>
<p>In cloud environments, application <em>disposability</em> is important because it increases agility during releases, scaling events, and failures. An application exhibits <em>disposability</em> when it properly handles certain types of asynchronous notifications called <a href="https://en.wikipedia.org/wiki/Signal_(IPC)" target="_blank" rel="noopener">signals</a>. Signals help local supervisory services (e.g., <a href="https://www.freedesktop.org/wiki/Software/systemd/" target="_blank" rel="noopener">systemd</a> and <a href="https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/" target="_blank" rel="noopener">Kubelet</a>) manage an application’s lifecycle externally.</p>
<p>Gunicorn has built-in support for <a href="https://docs.gunicorn.org/en/stable/signals.html" target="_blank" rel="noopener">signal handling</a>. If you use it as your application server, it will automatically handle signals like <code class="language-plaintext highlighter-rouge">SIGTERM</code> to facilitate a graceful shutdown of the application.</p>
<h2 id="devprod-parity">Dev/prod parity</h2>
<p><a href="#config">Configuration</a> allows a single <em>build</em> of a <em>codebase</em> to run locally, in staging, and in production. Leveraging that to maintain parity across environments keeps incompatibilities from cropping up as software is being developed. This results in a higher degree of confidence that the application will function the same way in production, as it did locally.</p>
<p>Still, maintaining development and production parity is an ongoing challenge. Much like speed and security, you have to be constantly thinking about it, or else you lose it.</p>
<p>Nowadays, operating system support for namespacing resources through containerization, along with higher level tooling like Docker and Docker Compose, go a long way toward making this pursuit easier to achieve. As an example, see the following Docker Compose configuration file:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">version</span><span class="pi">:</span> <span class="s2">"</span><span class="s">3"</span>
<span class="na">services</span><span class="pi">:</span>
<span class="na">database</span><span class="pi">:</span>
<span class="na">image</span><span class="pi">:</span> <span class="s">postgres:12.6</span>
<span class="na">environment</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">POSTGRES_USER=mysite</span>
<span class="pi">-</span> <span class="s">POSTGRES_PASSWORD=mysite</span>
<span class="pi">-</span> <span class="s">POSTGRES_DB=mysite</span>
<span class="na">web</span><span class="pi">:</span>
<span class="na">image</span><span class="pi">:</span> <span class="s">mysite</span>
<span class="na">environment</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">POSTGRES_HOST=database</span>
<span class="pi">-</span> <span class="s">POSTGRES_PORT=5432</span>
<span class="pi">-</span> <span class="s">POSTGRES_USER=mysite</span>
<span class="pi">-</span> <span class="s">POSTGRES_PASSWORD=mysite</span>
<span class="pi">-</span> <span class="s">POSTGRES_DB=mysite</span>
<span class="pi">-</span> <span class="s">DJANGO_ENV=Development</span>
<span class="pi">-</span> <span class="s">DJANGO_SECRET_KEY=secret</span>
<span class="pi">-</span> <span class="s">DJANGO_LOG_LEVEL=DEBUG</span>
<span class="na">build</span><span class="pi">:</span>
<span class="na">context</span><span class="pi">:</span> <span class="s">.</span>
<span class="na">dockerfile</span><span class="pi">:</span> <span class="s">Dockerfile</span>
<span class="na">entrypoint</span><span class="pi">:</span> <span class="s">gunicorn</span>
<span class="na">command</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s2">"</span><span class="s">mysite.asgi:application"</span>
<span class="pi">-</span> <span class="s2">"</span><span class="s">-b</span><span class="nv"> </span><span class="s">0.0.0.0:8000"</span>
<span class="pi">-</span> <span class="s2">"</span><span class="s">-k</span><span class="nv"> </span><span class="s">uvicorn.workers.UvicornWorker"</span>
<span class="na">ports</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s2">"</span><span class="s">8000:8000"</span>
</code></pre></div></div>
<p>Within this relatively small file, we have defined all services needed to run our application locally. Each service (<code class="language-plaintext highlighter-rouge">database</code> and <code class="language-plaintext highlighter-rouge">web</code>) run as separate processes within their own containers, but are networked together. From the perspective of our Django application, this setup differs minimally from a true production container orchestration setup.</p>
<h2 id="logs">Logs</h2>
<p><em>Logs</em> emitted by an application provide visibility into its behavior. However, in cloud environments you cannot reliably predict where your application is going to run. This makes it difficult to get visibility into the application’s behavior—unless, you treat application logging as a <em>stream</em>. Treating application logs as a stream makes it easier for other services to aggregate and archive log output for centralized viewing.</p>
<p>Django uses Python’s built-in <a href="https://docs.python.org/3/library/logging.html#module-logging" target="_blank" rel="noopener">logging</a> module to perform system logging, which allows it to be set up in some pretty sophisticated ways. However, all we want is for Django to log everything as a <em>stream</em> to standard out. We can make that happen by specifying a custom logging configuration dictionary in <code class="language-plaintext highlighter-rouge">settings.py</code> that looks like:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">LOGGING</span> <span class="o">=</span> <span class="p">{</span>
<span class="sh">"</span><span class="s">version</span><span class="sh">"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="sh">"</span><span class="s">disable_existing_loggers</span><span class="sh">"</span><span class="p">:</span> <span class="bp">False</span><span class="p">,</span>
<span class="sh">"</span><span class="s">handlers</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span>
<span class="sh">"</span><span class="s">console</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span>
<span class="sh">"</span><span class="s">class</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">logging.StreamHandler</span><span class="sh">"</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">},</span>
<span class="sh">"</span><span class="s">root</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span>
<span class="sh">"</span><span class="s">handlers</span><span class="sh">"</span><span class="p">:</span> <span class="p">[</span><span class="sh">"</span><span class="s">console</span><span class="sh">"</span><span class="p">],</span>
<span class="sh">"</span><span class="s">level</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">WARNING</span><span class="sh">"</span><span class="p">,</span>
<span class="p">},</span>
<span class="sh">"</span><span class="s">loggers</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span>
<span class="sh">"</span><span class="s">django</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span>
<span class="sh">"</span><span class="s">handlers</span><span class="sh">"</span><span class="p">:</span> <span class="p">[</span><span class="sh">"</span><span class="s">console</span><span class="sh">"</span><span class="p">],</span>
<span class="sh">"</span><span class="s">level</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">WARNING</span><span class="sh">"</span><span class="p">,</span>
<span class="sh">"</span><span class="s">propagate</span><span class="sh">"</span><span class="p">:</span> <span class="bp">False</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">},</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This configures the parent root logger to send messages with the <code class="language-plaintext highlighter-rouge">WARNING</code> level and higher to the console handler (e.g., standard out). It also has support to tune the default Django log levels via the <code class="language-plaintext highlighter-rouge">DJANGO_LOG_LEVEL</code> environment variable. A dynamic override like this can be extremely helpful when troubleshooting because it allows logging settings to be modified without requiring a new <em>release</em>.</p>
<h2 id="admin-processes">Admin processes</h2>
<p>Administrative tasks are essential to every application. It is important for the code associated them to ship with the application to avoid synchronization issues as they are invoked in the same execution environment as the application.</p>
<p>Most of Django’s supporting administrative tasks, like applying database migrations, sending test emails, and adding users, can already be executed as one-off processes. In addition, Django provides a <a href="https://docs.djangoproject.com/en/3.1/howto/custom-management-commands/" target="_blank" rel="noopener">robust framework</a> for adding more that are specific to your application (e.g., toggling feature flags, orchestrating data imports, etc.).</p>
<p>As an example, we can apply outstanding database migrations (there should be some for a newly initialized Django project) with the built-in <code class="language-plaintext highlighter-rouge">migrate</code> command:</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>docker-compose run <span class="nt">--rm</span> <span class="nt">--entrypoint</span> <span class="s2">"python manage.py"</span> web migrate
<span class="go">Creating django-blog_web_run ... done
Operations to perform:
Apply all migrations: admin, auth, contenttypes, sessions
Running migrations:
Applying contenttypes.0001_initial... OK
Applying auth.0001_initial... OK
Applying admin.0001_initial... OK
Applying admin.0002_logentry_remove_auto_add... OK
Applying admin.0003_logentry_add_action_flag_choices... OK
Applying contenttypes.0002_remove_content_type_name... OK
Applying auth.0002_alter_permission_name_max_length... OK
Applying auth.0003_alter_user_email_max_length... OK
Applying auth.0004_alter_user_username_opts... OK
Applying auth.0005_alter_user_last_login_null... OK
Applying auth.0006_require_contenttypes_0002... OK
Applying auth.0007_alter_validators_add_error_messages... OK
Applying auth.0008_alter_user_username_max_length... OK
Applying auth.0009_alter_user_last_name_max_length... OK
Applying auth.0010_alter_group_name_max_length... OK
Applying auth.0011_update_proxy_permissions... OK
Applying auth.0012_alter_user_first_name_max_length... OK
Applying sessions.0001_initial... OK
</span></code></pre></div></div>
<p>Here, we dynamically override the previously referenced Docker Compose configuration with <code class="language-plaintext highlighter-rouge">--entrypoint</code> set to <code class="language-plaintext highlighter-rouge">python manage.py</code> instead of <code class="language-plaintext highlighter-rouge">gunicorn</code>. We also specify that we want the <code class="language-plaintext highlighter-rouge">migrate</code> subcommand to be run. This execution leads to a series of cross-container communications that ensure our database schema aligns with the current state of Django’s data model.</p>
<hr />
<p>That’s it! Whether you were aware of the 12 Factor methodology before or not, I hope that seeing it applied to a Django application enables you to more easily integrate it with whatever web framework you use. May it lead to more configurable, scalable, and reliable applications. <em>Amen</em>.</p>
<hr />
<p><small class="kudos"><em>Thanks to <a href="https://twitter.com/davekonopka" target="_blank" rel="noopener">Dave Konopka</a> for providing thoughtful feedback on my drafts of this post.</em></small></p>
Leaving Comments on My Own Pull Requests2021-02-24T00:00:00-05:00https://hector.dev/2021/02/24/leaving-comments-on-my-own-pull-requests<p>For the record, the process of leaving comments on my own pull requests isn’t something I came up with on my own. I adopted it from a previous colleague of mine, <a href="https://twitter.com/jean_cochrane" target="_blank" rel="noopener">Jean Cochrane</a>.</p>
<p>A while ago, Jean was being onboarded onto a team I was responsible for. Part of the onboarding process involved working through a <a href="https://www.azavea.com/blog/2018/10/10/engineer-onboarding-breakable-toy/" target="_blank" rel="noopener">breakable toy</a> exercise, which is a project similar in toolset to the ones we’d work on day-to-day, but different in scope. As part of going through that, I encouraged Jean to take notes on any steps of the exercise that weren’t clear. Jean took that further and annotated each associated pull request with comments containing their in-context notes.</p>
<p>As a reviewer, it was phenomenal to have those prompts front and center. With them, we could immediately begin cutting through any existing ambiguity and work toward a joint understanding of the changes. <mark>It was a refreshing experience, and I’ve been trying to reproduce it for all of my pull request reviewers ever since.</mark></p>
<h2 id="the-process">The process</h2>
<p>Over the years, I’ve refined the process I use before assigning my pull requests for review. In the beginning, it consisted of making sure my changes worked and looked acceptable. That evolved into ensuring all of my commits represented logical changes to the codebase and were as concise as possible. Later, I began placing extra emphasis on clear and reproducible testing instructions.</p>
<p>I still believe all of these pursuits are important, but leaving comments on my own pull requests is the newest addition.</p>
<p>First, I open the pull request (or a draft pull request—if that feature is available to you). Immediately after, I scan through the changes and proactively annotate important lines with comments. The comments aim to direct the reviewer’s attention to areas of the code I think would benefit from direct engagement. Some examples include:</p>
<ul>
<li>Calling attention to a tradeoff I made</li>
<li>Elaborating on a not so obvious change in the change set</li>
<li>Self-identifying an area where I wasn’t 100% certain about the approach I took</li>
<li>Explaining a concept I don’t think my reviewer has been exposed to yet</li>
</ul>
<p>While a lot of these issues can be covered in the pull request body, I find that associating the details directly with the relevant lines of code is far more inviting to reviewers. Now, instead of trying to guess the exact changes I was uncertain about, or skipping over unfamiliar parts of the change set, the reviewer receives a clear set of prompts with supporting detail from my perspective.</p>
<h2 id="real-world-examples">Real-world examples</h2>
<h3 id="1-calling-attention-to-a-tradeoff-i-made">1. Calling attention to a tradeoff I made</h3>
<p>Here, I needed a way to annotate a container image with revision relevant tags and labels. There were several different approaches to choose from, but since this process is happening in GitHub Actions, I settled on the recommended approach by <a href="https://github.com/docker/build-push-action" target="_blank" rel="noopener">docker/build-push-action</a>.</p>
<p>It also felt important to leave a comment here because if I was reviewing this pull request and I saw the words “crazy max” strung together, it would have immediately triggered my spidey senses. No offense, Max. :laughing:</p>
<p><img src="/assets/resized/pr-comment-01-800x334.png" alt="Comment from vercel/cosmosdb-server/pull/62" srcset="/assets/resized/pr-comment-01-320x133.png 320w,/assets/resized/pr-comment-01-480x200.png 480w,/assets/resized/pr-comment-01-800x334.png 800w, /images/2021-02-23-leaving-comments-on-my-own-pull-requests/pr-comment-01.png 1616w" /></p>
<p>See: <a href="https://github.com/vercel/cosmosdb-server/pull/62#discussion_r552927093" target="_blank" rel="noopener">vercel/cosmosdb-server/pull/62</a></p>
<h3 id="2-elaborating-on-a-not-so-obvious-change-in-the-change-set">2. Elaborating on a not so obvious change in the change set</h3>
<p>In this case, I upgraded a library in an earlier troubleshooting step. That didn’t end up resolving the issue, but after I did finally resolve it, I decided to keep the library upgrade in so that the dependencies would be up-to-date.</p>
<p>All library upgrades incur some risk. Are we both willing to agree that the risk is worthwhile here?</p>
<p><img src="/assets/resized/pr-comment-02-800x359.png" alt="Comment from PublicMapping/districtbuilder/pull/410" srcset="/assets/resized/pr-comment-02-320x144.png 320w,/assets/resized/pr-comment-02-480x216.png 480w,/assets/resized/pr-comment-02-800x359.png 800w, /images/2021-02-23-leaving-comments-on-my-own-pull-requests/pr-comment-02.png 1616w" /></p>
<p>See: <a href="https://github.com/PublicMapping/districtbuilder/pull/410#pullrequestreview-487132294" target="_blank" rel="noopener">PublicMapping/districtbuilder/pull/410</a></p>
<h3 id="3-self-identifying-an-area-where-i-wasnt-100-certain-about-the-approach-i-took">3. Self-identifying an area where I wasn’t 100% certain about the approach I took</h3>
<p>Here, I decided to proactively drop Python 3.5 support from an existing library because it was approaching end-of-life. However, when you’re a library maintainer, these types of changes can have a large impact. I wanted to draw attention to the change so that the maintainers could engage with my decision from their perspective.</p>
<p><img src="/assets/resized/pr-comment-03-800x436.png" alt="Comment from stac-utils/pystac/pull/108" srcset="/assets/resized/pr-comment-03-320x174.png 320w,/assets/resized/pr-comment-03-480x261.png 480w,/assets/resized/pr-comment-03-800x436.png 800w, /images/2021-02-23-leaving-comments-on-my-own-pull-requests/pr-comment-03.png 1616w" /></p>
<p>See: <a href="https://github.com/stac-utils/pystac/pull/108#pullrequestreview-444917677" target="_blank" rel="noopener">stac-utils/pystac/pull/108</a></p>
<h3 id="4-explaining-a-concept-i-dont-think-my-reviewer-has-been-exposed-to-yet">4. Explaining a concept I don’t think my reviewer has been exposed to yet</h3>
<p>I needed a way to supply Docker Hub credentials to a GitHub Actions workflow so that release specific container images could be published. In this case, I wasn’t the repository owner, so I couldn’t set up the credentials myself. I left this comment to provide the repository owners with as much detail as possible to help make credential set up easy.</p>
<p><img src="/assets/resized/pr-comment-04-800x435.png" alt="Comment from vercel/cosmosdb-server/pull/62" srcset="/assets/resized/pr-comment-04-320x174.png 320w,/assets/resized/pr-comment-04-480x261.png 480w,/assets/resized/pr-comment-04-800x435.png 800w, /images/2021-02-23-leaving-comments-on-my-own-pull-requests/pr-comment-04.png 1616w" /></p>
<p>See: <a href="https://github.com/vercel/cosmosdb-server/pull/62#discussion_r552927990" target="_blank" rel="noopener">vercel/cosmosdb-server/pull/62</a></p>
<h2 id="code-vs-pull-request-comments">Code vs. pull request comments</h2>
<p>Differentiating between code and pull request-level comments is a question I get asked often when discussing this technique. While it is important to strike a good balance between the two, I find myself encouraging people to worry less about answering this question and focusing more on creating pull request comments in a thoughtful way. If a reviewer reads a comment and thinks it is important enough to persist in the codebase, that’s an easy suggestion and change. Asking reviewers to request the removal of existing code comments is a heavier ask.</p>
<p>That said, here are few loose guidelines for navigating the decision-making process:</p>
<ul>
<li>If your comment carries relevance beyond the lifecycle of a pull request, consider that it may benefit from being a code comment.</li>
<li>If you’re making an <em>architecturally significant</em> decision in a pull request, then it probably warrants a separate write-up in an <a href="https://www.cognitect.com/blog/2011/11/15/documenting-architecture-decisions" target="_blank" rel="noopener">Architecture Decision Record</a>.</li>
<li>If you find yourself leaving lots of pull request comments, reflect on whether your pull request is too large, your comments are truly beneficial, or if the code itself would benefit from more clarity.</li>
</ul>
<hr />
<p><small class="kudos"><em>Special thanks to <a href="https://twitter.com/jean_cochrane" target="_blank" rel="noopener">Jean Cochrane</a> for exposing me to this technique. Also, thanks to both Jean and <a href="https://twitter.com/davekonopka" target="_blank" rel="noopener">Dave Konopka</a> for reviewing my writing.</em></small></p>
How I Make Slack Work for Me2021-02-13T00:00:00-05:00https://hector.dev/2021/02/13/how-i-make-slack-work-for-me<p>As a daily Slack user for the last seven years, I’ve spent a lot of time exploring ways to get the most out of it as a collaboration tool. While I have mixed feelings about its impact on productivity, I figure Slack isn’t going away any time soon, so I may as well learn how to make it work for me.</p>
<p>The sections below capture some Slack features and general tactics I’ve employed to make the most out of Slack as a tech lead and engineering leader in a software development focused organization. I hope you find some of them useful.</p>
<ul>
<li><a href="#1-all-unread-entrypoint">All unread entrypoint</a></li>
<li><a href="#2-follow-thread">Follow thread</a></li>
<li><a href="#3-reminders-on-messages-to-track-commitments">Reminders on messages to track commitments</a></li>
<li><a href="#4-strategic-keyword-notifications">Strategic keyword notifications</a></li>
<li><a href="#5-saved-items-as-source-for-feedback">Saved items as source for feedback</a></li>
<li><a href="#6-quiet-hours">Quiet hours</a></li>
<li><a href="#7-team-changelog-channel">Team <code class="language-plaintext highlighter-rouge">CHANGELOG</code> channel</a></li>
</ul>
<hr />
<h2 id="1-all-unread-entrypoint">1. All unread entrypoint</h2>
<p>By default, the <a href="https://slack.com/help/articles/226410907-View-all-your-unread-messages" target="_blank" rel="noopener">all unread</a> feature of Slack is disabled. When enabled, it adds a new top level entry to the left-hand Slack navigation that allows you to browse all of your unread messages, grouped by channel, in a single view. While in this mode, you can scan messages, but also access all of the individual message shortcuts.</p>
<p>I use this feature as the entrypoint for digesting all Slack messages because it enables the move <a href="https://twitter.com/deniseyu21" target="_blank" rel="noopener">Denise Yu</a> so eloquently summarized as, <em>The Art of the Rollup</em>.</p>
<div class="jekyll-twitter-plugin"><blockquote class="twitter-tweet" align="center"><p lang="en" dir="ltr">enter what i've been mentally noting as "the art of the rollup". read the backscroll, take a deep breath, wait 5 mins, and write to entire channel:<br /><br />"To summarize: the problem is X. Possible paths forward are A, B, C. Sounds like we're leaning towards A. have I missed anything?"</p>— @deniseyu@mastodon.social (@deniseyu21) <a href="https://twitter.com/deniseyu21/status/1357832806932561920?ref_src=twsrc%5Etfw">February 5, 2021</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
</div>
<p>The ability to digest top level channel discussion as it develops, while still leaving it all marked an unread, allows me to bookend the context necessary to assemble an effective rollup.</p>
<h2 id="2-follow-thread">2. Follow thread</h2>
<p>The <a href="https://slack.com/help/articles/115000769927-Use-threads-to-organize-discussions-#manage-the-threads-you-follow" target="_blank" rel="noopener">follow thread</a> feature is easily the Slack feature I use most on this list. It allows you to subscribe to messages published in a thread <em>without</em> contributing any messages to the thread. I use it pretty liberally on any interesting message that pops up in a channel. Then, I mark the channel as read via the All unread view referenced above.</p>
<p>It is worth cautioning that heavy use of this feature can quickly escalate into behavior that becomes indistinguishable from micromanagement. Especially, if you use it to inject yourself into lots of conversations where people are trying to develop problem solving skills.</p>
<h2 id="3-reminders-on-messages-to-track-commitments">3. Reminders on messages to track commitments</h2>
<p>The Kahn Academy <a href="https://docs.google.com/document/d/1qr0d05X5-AsyDYqKRCfgGGcWSshTMd_vfTggfhDpbls/edit" target="_blank" rel="noopener">career development guide</a> emphasizes a top level attribute called <em>Maturity</em>. They cite the ability to follow through on your commitments as a sign of maturity (e.g., doing what you say you are going to do).</p>
<p>As a typical work day progresses, tons of micro commitments come up and many occur in chat. <a href="https://slack.com/help/articles/208423427-Set-a-reminder#set-a-reminder-for-a-message" target="_blank" rel="noopener">Setting a reminder on a message</a> provides an effective in-context way to track, snooze, and reschedule commitments so that they don’t get lost in the shuffle.</p>
<h2 id="4-strategic-keyword-notifications">4. Strategic keyword notifications</h2>
<p>Most folks are familiar with (and possibly loathe) Slack notifications. Notifications happen when you get a direct message, when someone mentions you, or when someone mentions a group alias you’re a member of. But, Slack also provides a way to <a href="https://slack.com/help/articles/201355156-Configure-your-Slack-notifications#keyword-notifications" target="_blank" rel="noopener">set up an open-ended list of keywords</a> that trigger notifications.</p>
<p>In the past I’ve taken advantage of this feature to target certain keywords that have a tendency to lead to <em>architecturally significant</em> events:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">bad idea</code></li>
<li><code class="language-plaintext highlighter-rouge">cache</code></li>
<li><code class="language-plaintext highlighter-rouge">lock</code></li>
<li><code class="language-plaintext highlighter-rouge">redis</code></li>
<li><code class="language-plaintext highlighter-rouge">should work</code></li>
<li><code class="language-plaintext highlighter-rouge">trivial</code></li>
</ul>
<h2 id="5-saved-items-as-source-for-feedback">5. Saved items as source for feedback</h2>
<p>It has been said that feedback is a gift. But, as with any great gift, feedback can be difficult to identify and deliver.</p>
<p>One way to make the feedback more effective is to connect it to specific events. For example, telling someone that they did a really good job at disambiguating a complex topic <em>in a meeting last week</em>. Or, that their testing instructions <em>on a pull request from yesterday</em> were detailed and easy to follow.</p>
<p>Neither of these examples are tied to chat, but many others are. To help persist that level of specificity across different Slack channels, I repurpose the Slack <a href="https://slack.com/help/articles/360042650274-Save-messages-and-files-" target="_blank" rel="noopener">save messages and files</a> feature to track examples of both exemplary and poor communication. Any time I see a good candidate, I don’t have to think—I just click on the bookmark (used to be :star:) icon. Later, I draw upon that list to support feedback in venues like one-on-ones, performance reviews, calls for kudos, etc.</p>
<h2 id="6-quiet-hours">6. Quiet hours</h2>
<p>A couple of years back, I was exposed to the concept of Slack quiet hours by <a href="https://twitter.com/kepioo" target="_blank" rel="noopener">Nassim Kammah</a> in a talk about <a href="https://www.youtube.com/watch?v=RMsZbchAwoY" target="_blank" rel="noopener">remote-first team practices</a>. Quiet hours are periods of blocked-off time when the team does not actively engage in Slack conversations. Colleagues are encouraged to save questions, requests, and conversations for outside of these periods.</p>
<p>Reserved blocks of time off Slack aim to help enable deep work and mitigate the amount of context switching and <a href="https://en.wikipedia.org/wiki/Fear_of_missing_out" target="_blank" rel="noopener">FOMO</a> that can occur as we bounce between completing tasks and keeping up with the never-ending Slack firehose.</p>
<h2 id="7-team-changelog-channel">7. Team <code class="language-plaintext highlighter-rouge">CHANGELOG</code> channel</h2>
<p>Also sourced from Nassim’s talk above is the use of <a href="https://reacji-channeler.builtbyslack.com" target="_blank" rel="noopener">Reacji Channeler</a>. Reacji Channeler is a Slack application that routes messages annotated with specific reactions to a designated channel. It can be configured in many ways to target a wide variety of use cases, but the use case described in the talk is particularly interesting: using it to produce a team <code class="language-plaintext highlighter-rouge">CHANGELOG</code>.</p>
<p>As significant events occur throughout a team’s day-to-day, someone summarizes (or rolls up) the event into one message that includes the surrounding context. When the appropriate reaction is applied to the message, it gets routed to a team <code class="language-plaintext highlighter-rouge">CHANGELOG</code> channel (e.g., <code class="language-plaintext highlighter-rouge">#sre-team-changelog</code>).</p>
<p>The goal is to produce a channel log such that if someone goes on vacation for a week, they can come back, read just that channel’s backscroll, and be caught up.</p>
<hr />
<p><small style="opacity: 80%"><em>Special thanks to <a href="https://twitter.com/rajadain" target="_blank" rel="noopener">Terence Tuhinanshu</a> for encouraging me to write this.</em></small></p>
Creating Go Application Releases with GoReleaser2021-01-18T00:00:00-05:00https://hector.dev/2021/01/18/creating-go-application-releases-with-goreleaser<p>A few weeks ago, I set out to upgrade the version of Go (1.6 to 1.15) used to build an old command-line utility I developed, named <a href="https://github.com/hectcastro/heimdall" target="_blank" rel="noopener">Heimdall</a>. Heimdall provides a way to wrap an executable program inside of an exclusive lock provided by a central PostgreSQL instance via <code class="language-plaintext highlighter-rouge">pg_try_advisory_lock</code>.</p>
<p>Now, Heimdall is nice little utility and all (if you’re intrigued, check out the <code class="language-plaintext highlighter-rouge">README</code>), but the most interesting part of the upgrade process came after I got everything working and started to think about how to create a new release. That’s when I came across <a href="https://goreleaser.com" target="_blank" rel="noopener">GoReleaser</a>.</p>
<h2 id="goreleaser">GoReleaser</h2>
<p>GoReleaser is a release automation tool specifically for Go projects. With a few bits of <a href="https://github.com/hectcastro/heimdall/blob/1ffc81c5457ae8692b18117a8329e3e3997e80e1/.goreleaser.yml" target="_blank" rel="noopener">YAML configuration</a>, GoReleaser provided me with:</p>
<ul>
<li>Hooks into the Go module system for managing library dependencies</li>
<li>The ability to easily produce a set of <a href="https://github.com/hectcastro/heimdall/releases/tag/1.0.0" target="_blank" rel="noopener">build artifacts</a> for multiple operating systems and computer architectures</li>
<li><a href="https://github.com/hectcastro/heimdall/releases/download/1.0.0/checksums.txt" target="_blank" rel="noopener">Checksums</a> for each of the build artifacts</li>
<li>Easy <a href="https://github.com/hectcastro/heimdall/blob/1ffc81c5457ae8692b18117a8329e3e3997e80e1/.github/workflows/goreleaser.yml" target="_blank" rel="noopener">integration with GitHub Actions</a> to automate publishing releases on tagged commits</li>
</ul>
<p>If you are responsible for Go applications that are in need of a uniform release process, I find it really hard to beat GoReleaser.</p>
Validating Data in Python with Cerberus2020-12-29T00:00:00-05:00https://hector.dev/2020/12/29/validating-data-in-python-with-cerberus<p>This year was my first participating in <a href="https://adventofcode.com" target="_blank" rel="noopener">Advent of Code</a>—and I’m glad I did, because solving one of the <a href="https://adventofcode.com/2020/day/4" target="_blank" rel="noopener">challenges</a> exposed me to an excellent data validation library for Python named <a href="https://docs.python-cerberus.org/en/stable/" target="_blank" rel="noopener">Cerberus</a>.</p>
<h2 id="whats-in-a-valid-passport">What’s in a valid passport</h2>
<p>Below are some excerpts from the challenge, along with specific field level validation rules:</p>
<blockquote>
<p>You arrive at the airport only to realize that you grabbed your North Pole Credentials instead of your passport. While these documents are extremely similar, North Pole Credentials aren’t issued by a country and therefore aren’t actually valid documentation for travel in most of the world.</p>
<p>It seems like you’re not the only one having problems, though; a very long line has formed for the automatic passport scanners, and the delay could upset your travel itinerary.</p>
<p>…</p>
<p>The line is moving more quickly now, but you overhear airport security talking about how passports with invalid data are getting through. Better add some data validation, quick!</p>
<p>You can continue to ignore the <code class="language-plaintext highlighter-rouge">cid</code> field, but each other field has strict rules about what values are valid for automatic validation:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">byr</code> (Birth Year) - four digits; at least <code class="language-plaintext highlighter-rouge">1920</code> and at most <code class="language-plaintext highlighter-rouge">2002</code>.</li>
<li><code class="language-plaintext highlighter-rouge">iyr</code> (Issue Year) - four digits; at least <code class="language-plaintext highlighter-rouge">2010</code> and at most <code class="language-plaintext highlighter-rouge">2020</code>.</li>
<li><code class="language-plaintext highlighter-rouge">eyr</code> (Expiration Year) - four digits; at least <code class="language-plaintext highlighter-rouge">2020</code> and at most <code class="language-plaintext highlighter-rouge">2030</code>.</li>
<li><code class="language-plaintext highlighter-rouge">hgt</code> (Height) - a number followed by either <code class="language-plaintext highlighter-rouge">cm</code> or <code class="language-plaintext highlighter-rouge">in</code>:
<ul>
<li>If <code class="language-plaintext highlighter-rouge">cm</code>, the number must be at least <code class="language-plaintext highlighter-rouge">150</code> and at most <code class="language-plaintext highlighter-rouge">193</code>.</li>
<li>If <code class="language-plaintext highlighter-rouge">in</code>, the number must be at least <code class="language-plaintext highlighter-rouge">59</code> and at most <code class="language-plaintext highlighter-rouge">76</code>.</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">hcl</code> (Hair Color) - a # followed by exactly six characters <code class="language-plaintext highlighter-rouge">0-9</code> or <code class="language-plaintext highlighter-rouge">a-f</code>.</li>
<li><code class="language-plaintext highlighter-rouge">ecl</code> (Eye Color) - exactly one of: <code class="language-plaintext highlighter-rouge">amb</code> <code class="language-plaintext highlighter-rouge">blu</code> <code class="language-plaintext highlighter-rouge">brn</code> <code class="language-plaintext highlighter-rouge">gry</code> <code class="language-plaintext highlighter-rouge">grn</code> <code class="language-plaintext highlighter-rouge">hzl</code> <code class="language-plaintext highlighter-rouge">oth</code>.</li>
<li><code class="language-plaintext highlighter-rouge">pid</code> (Passport ID) - a nine-digit number, including leading zeroes.</li>
<li><code class="language-plaintext highlighter-rouge">cid</code> (Country ID) - ignored, missing or not.</li>
</ul>
<p>Your job is to count the passports where all required fields are both <strong>present</strong> and <strong>valid</strong> according to the above rules.</p>
</blockquote>
<p>For completeness, here are some invalid passports (delimited by <code class="language-plaintext highlighter-rouge">\n\n</code>):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>eyr:1972 cid:100
hcl:#18171d ecl:amb hgt:170 pid:186cm iyr:2018 byr:1926
iyr:2019
hcl:#602927 eyr:1967 hgt:170cm
ecl:grn pid:012533040 byr:1946
hcl:dab227 iyr:2012
ecl:brn hgt:182cm pid:021572410 eyr:2020 byr:1992 cid:277
</code></pre></div></div>
<p>And, some valid passports:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pid:087499704 hgt:74in ecl:grn iyr:2012 eyr:2030 byr:1980
hcl:#623a2f
eyr:2029 ecl:blu cid:129 byr:1989
iyr:2014 pid:896056539 hcl:#a97842 hgt:165cm
hcl:#888785
hgt:164cm byr:2001 iyr:2015 cid:88
pid:545766238 ecl:hzl
eyr:2022
</code></pre></div></div>
<p>Most of the validation rules look straightforward in isolation, but less so when you think about composing them all together.</p>
<h2 id="validating-passports-with-cerberus">Validating passports with Cerberus</h2>
<p>Step one involved getting familiar with Cerberus <a href="https://docs.python-cerberus.org/en/stable/validation-rules.html" target="_blank" rel="noopener">validation rules</a>. The library supports rules like the following:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">contains</code> - This rule validates that the a container object contains all of the defined items.</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">document</span> <span class="o">=</span> <span class="p">{</span><span class="sh">"</span><span class="s">states</span><span class="sh">"</span><span class="p">:</span> <span class="p">[</span><span class="sh">"</span><span class="s">peace</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">love</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">inity</span><span class="sh">"</span><span class="p">]}</span>
<span class="o">>>></span> <span class="n">schema</span> <span class="o">=</span> <span class="p">{</span><span class="sh">"</span><span class="s">states</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span><span class="sh">"</span><span class="s">contains</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">peace</span><span class="sh">"</span><span class="p">}}</span>
<span class="o">>>></span> <span class="n">v</span><span class="p">.</span><span class="nf">validate</span><span class="p">(</span><span class="n">document</span><span class="p">,</span> <span class="n">schema</span><span class="p">)</span>
<span class="bp">True</span>
</code></pre></div></div>
<ul>
<li><code class="language-plaintext highlighter-rouge">regex</code> - The validation will fail if the field’s value does not match the provided regular expression.</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">schema</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">...</span> <span class="sh">"</span><span class="s">email</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span>
<span class="p">...</span> <span class="sh">"</span><span class="s">type</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">string</span><span class="sh">"</span><span class="p">,</span>
<span class="p">...</span> <span class="sh">"</span><span class="s">regex</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$</span><span class="sh">"</span>
<span class="p">...</span> <span class="p">}</span>
<span class="p">...</span> <span class="p">}</span>
<span class="o">>>></span> <span class="n">document</span> <span class="o">=</span> <span class="p">{</span><span class="sh">"</span><span class="s">email</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">john@example.com</span><span class="sh">"</span><span class="p">}</span>
<span class="o">>>></span> <span class="n">v</span><span class="p">.</span><span class="nf">validate</span><span class="p">(</span><span class="n">document</span><span class="p">,</span> <span class="n">schema</span><span class="p">)</span>
<span class="bp">True</span>
</code></pre></div></div>
<ul>
<li><code class="language-plaintext highlighter-rouge">required</code> - If <code class="language-plaintext highlighter-rouge">True</code> the field is mandatory. Validation will fail when it is missing.</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">v</span><span class="p">.</span><span class="n">schema</span> <span class="o">=</span> <span class="p">{</span><span class="sh">"</span><span class="s">name</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span><span class="sh">"</span><span class="s">required</span><span class="sh">"</span><span class="p">:</span> <span class="bp">True</span><span class="p">,</span> <span class="sh">"</span><span class="s">type</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">string</span><span class="sh">"</span><span class="p">},</span> <span class="sh">"</span><span class="s">age</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span><span class="sh">"</span><span class="s">type</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">integer</span><span class="sh">"</span><span class="p">}}</span>
<span class="o">>>></span> <span class="n">document</span> <span class="o">=</span> <span class="p">{</span><span class="sh">"</span><span class="s">age</span><span class="sh">"</span><span class="p">:</span> <span class="mi">10</span><span class="p">}</span>
<span class="o">>>></span> <span class="n">v</span><span class="p">.</span><span class="nf">validate</span><span class="p">(</span><span class="n">document</span><span class="p">)</span>
<span class="bp">False</span>
</code></pre></div></div>
<p>Step two involved converting the passports into Cerberus documents. This was mostly an exercise in parsing uniquely assembled text into Python dictionaries.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Split the batch file records by double newline.
</span><span class="k">for</span> <span class="n">record</span> <span class="ow">in</span> <span class="n">batch_file</span><span class="p">.</span><span class="nf">read</span><span class="p">().</span><span class="nf">split</span><span class="p">(</span><span class="sh">"</span><span class="se">\n\n</span><span class="sh">"</span><span class="p">):</span>
<span class="c1"># Split the fields within a record by a space or newline.
</span> <span class="n">record_field_list</span> <span class="o">=</span> <span class="p">[</span>
<span class="nf">tuple</span><span class="p">(</span><span class="n">field</span><span class="p">.</span><span class="nf">split</span><span class="p">(</span><span class="sh">"</span><span class="s">:</span><span class="sh">"</span><span class="p">))</span> <span class="k">for</span> <span class="n">field</span> <span class="ow">in</span> <span class="n">re</span><span class="p">.</span><span class="nf">compile</span><span class="p">(</span><span class="sa">r</span><span class="sh">"</span><span class="s">\s</span><span class="sh">"</span><span class="p">).</span><span class="nf">split</span><span class="p">(</span><span class="n">record</span><span class="p">.</span><span class="nf">strip</span><span class="p">())</span>
<span class="p">]</span>
</code></pre></div></div>
<p>That leaves <code class="language-plaintext highlighter-rouge">record_field_list</code> looking like:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">record_field_list</span>
<span class="p">[(</span><span class="sh">'</span><span class="s">ecl</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">gry</span><span class="sh">'</span><span class="p">),</span>
<span class="p">(</span><span class="sh">'</span><span class="s">pid</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">860033327</span><span class="sh">'</span><span class="p">),</span>
<span class="p">(</span><span class="sh">'</span><span class="s">eyr</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">2020</span><span class="sh">'</span><span class="p">),</span>
<span class="p">(</span><span class="sh">'</span><span class="s">hcl</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">#fffffd</span><span class="sh">'</span><span class="p">),</span>
<span class="p">(</span><span class="sh">'</span><span class="s">byr</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">1937</span><span class="sh">'</span><span class="p">),</span>
<span class="p">(</span><span class="sh">'</span><span class="s">iyr</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">2017</span><span class="sh">'</span><span class="p">),</span>
<span class="p">(</span><span class="sh">'</span><span class="s">cid</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">147</span><span class="sh">'</span><span class="p">),</span>
<span class="p">(</span><span class="sh">'</span><span class="s">hgt</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">183cm</span><span class="sh">'</span><span class="p">)]</span>
</code></pre></div></div>
<p>From there, <code class="language-plaintext highlighter-rouge">dict</code> converts the list of tuples into a proper Cerberus document:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">document</span> <span class="o">=</span> <span class="nf">dict</span><span class="p">(</span><span class="n">record_field_list</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">document</span>
<span class="p">{</span><span class="sh">'</span><span class="s">byr</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">1937</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">cid</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">147</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">ecl</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">gry</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">eyr</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">2020</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">hcl</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">#fffffd</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">hgt</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">183cm</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">iyr</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">2017</span><span class="sh">'</span><span class="p">,</span>
<span class="sh">'</span><span class="s">pid</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">860033327</span><span class="sh">'</span><span class="p">}</span>
</code></pre></div></div>
<h2 id="putting-it-all-together">Putting it all together</h2>
<p>Equipped with a better understanding of what’s possible with Cerberus, and a list of Python dictionaries representing passports, below is the schema I put together to enforce the passport validation rules of the challenge. Only one of the rules (<code class="language-plaintext highlighter-rouge">hgt</code>) required a custom function (<code class="language-plaintext highlighter-rouge">compare_hgt_with_units</code>).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">SCHEMA</span> <span class="o">=</span> <span class="p">{</span>
<span class="sh">"</span><span class="s">byr</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span><span class="sh">"</span><span class="s">min</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">1920</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">max</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">2002</span><span class="sh">"</span><span class="p">},</span>
<span class="sh">"</span><span class="s">iyr</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span><span class="sh">"</span><span class="s">min</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">2010</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">max</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">2020</span><span class="sh">"</span><span class="p">},</span>
<span class="sh">"</span><span class="s">eyr</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span><span class="sh">"</span><span class="s">min</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">2020</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">max</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">2030</span><span class="sh">"</span><span class="p">},</span>
<span class="sh">"</span><span class="s">hgt</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span>
<span class="sh">"</span><span class="s">anyof</span><span class="sh">"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span><span class="sh">"</span><span class="s">allof</span><span class="sh">"</span><span class="p">:</span> <span class="p">[{</span><span class="sh">"</span><span class="s">regex</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">[0-9]+cm</span><span class="sh">"</span><span class="p">},</span> <span class="p">{</span><span class="sh">"</span><span class="s">check_with</span><span class="sh">"</span><span class="p">:</span> <span class="n">compare_hgt_with_units</span><span class="p">}]},</span>
<span class="p">{</span><span class="sh">"</span><span class="s">allof</span><span class="sh">"</span><span class="p">:</span> <span class="p">[{</span><span class="sh">"</span><span class="s">regex</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">[0-9]+in</span><span class="sh">"</span><span class="p">},</span> <span class="p">{</span><span class="sh">"</span><span class="s">check_with</span><span class="sh">"</span><span class="p">:</span> <span class="n">compare_hgt_with_units</span><span class="p">}]},</span>
<span class="p">]</span>
<span class="p">},</span>
<span class="sh">"</span><span class="s">hcl</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span><span class="sh">"</span><span class="s">regex</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">#[0-9a-f]{6}</span><span class="sh">"</span><span class="p">},</span>
<span class="sh">"</span><span class="s">ecl</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span><span class="sh">"</span><span class="s">allowed</span><span class="sh">"</span><span class="p">:</span> <span class="p">[</span><span class="sh">"</span><span class="s">amb</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">blu</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">brn</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">gry</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">grn</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">hzl</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">oth</span><span class="sh">"</span><span class="p">]},</span>
<span class="sh">"</span><span class="s">pid</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span><span class="sh">"</span><span class="s">regex</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">[0-9]{9}</span><span class="sh">"</span><span class="p">},</span>
<span class="sh">"</span><span class="s">cid</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span><span class="sh">"</span><span class="s">required</span><span class="sh">"</span><span class="p">:</span> <span class="bp">False</span><span class="p">},</span>
<span class="p">}</span>
<span class="c1"># Provide a custom field validation function for a height with units.
</span><span class="k">def</span> <span class="nf">compare_hgt_with_units</span><span class="p">(</span><span class="n">field</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">value</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">error</span><span class="p">:</span> <span class="n">Callable</span><span class="p">[...,</span> <span class="nb">str</span><span class="p">])</span> <span class="o">-></span> <span class="bp">None</span><span class="p">:</span>
<span class="k">if</span> <span class="n">value</span><span class="p">.</span><span class="nf">endswith</span><span class="p">(</span><span class="sh">"</span><span class="s">cm</span><span class="sh">"</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="p">(</span><span class="mi">150</span> <span class="o"><=</span> <span class="nf">int</span><span class="p">(</span><span class="n">value</span><span class="p">.</span><span class="nf">rstrip</span><span class="p">(</span><span class="sh">"</span><span class="s">cm</span><span class="sh">"</span><span class="p">))</span> <span class="o"><=</span> <span class="mi">193</span><span class="p">):</span>
<span class="nf">error</span><span class="p">(</span><span class="n">field</span><span class="p">,</span> <span class="sh">"</span><span class="s">out of range</span><span class="sh">"</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">value</span><span class="p">.</span><span class="nf">endswith</span><span class="p">(</span><span class="sh">"</span><span class="s">in</span><span class="sh">"</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="p">(</span><span class="mi">59</span> <span class="o"><=</span> <span class="nf">int</span><span class="p">(</span><span class="n">value</span><span class="p">.</span><span class="nf">rstrip</span><span class="p">(</span><span class="sh">"</span><span class="s">in</span><span class="sh">"</span><span class="p">))</span> <span class="o"><=</span> <span class="mi">76</span><span class="p">):</span>
<span class="nf">error</span><span class="p">(</span><span class="n">field</span><span class="p">,</span> <span class="sh">"</span><span class="s">out of range</span><span class="sh">"</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="nf">error</span><span class="p">(</span><span class="n">field</span><span class="p">,</span> <span class="sh">"</span><span class="s">missing units</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>
<p>With a schema in place, all that’s left to do is instantiate a <code class="language-plaintext highlighter-rouge">Validator</code> and validate each document:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">v</span> <span class="o">=</span> <span class="nc">Validator</span><span class="p">(</span><span class="n">SCHEMA</span><span class="p">,</span> <span class="n">require_all</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">v</span><span class="p">.</span><span class="nf">validate</span><span class="p">(</span><span class="n">document</span><span class="p">)</span>
<span class="bp">True</span>
</code></pre></div></div>
<p>Thanks, Cerberus!</p>
Centralized Scala Steward with GitHub Actions2020-11-18T00:00:00-05:00https://hector.dev/2020/11/18/centralized-scala-steward-with-github-actions<p>Keeping project dependencies up-to-date is a challenging problem. Services like GitHub’s automated dependency updating system, <a href="https://github.blog/2020-06-01-keep-all-your-packages-up-to-date-with-dependabot/" target="_blank" rel="noopener">Dependabot</a>, go a long way to help make things easier, but that is only helpful if your package manager’s ecosystem is <a href="https://docs.github.com/en/free-pro-team@latest/github/administering-a-repository/configuration-options-for-dependency-updates#package-ecosystem" target="_blank" rel="noopener">supported</a>. In the case of Scala based projects, it is not.</p>
<p>Enter <a href="https://github.com/scala-steward-org/scala-steward" target="_blank" rel="noopener">Scala Steward</a>.</p>
<p>Scala Steward provides a similar, low-effort way to keep project dependencies up-to-date. You simply open a pull request against the Scala Steward repository and add a reference to <em>your</em> project’s GitHub repository inside of a specially designated Markdown file. After that, Scala Steward (which manifests itself as a robot user on GitHub) keeps your project dependencies up-to-date via pull requests.</p>
<p>Unfortunately, this easy-mode option requires that your repository be publicly accessible. There are <a href="https://engineering.avast.io/running-scala-steward-on-premise/" target="_blank" rel="noopener">options</a> for running Scala Steward as a service for yourself, but that path is less trodden and requires a bit more effort.</p>
<h2 id="scala-steward-and-github-actions">Scala Steward and GitHub Actions</h2>
<p>So what other options do you have if your Scala project is inside a private repository? Well, if your project is on GitHub, then you likely have access to their workflow automation service, <a href="https://docs.github.com/en/free-pro-team@latest/actions" target="_blank" rel="noopener">GitHub Actions</a>. Scala Steward’s maintainers created a <a href="https://github.com/scala-steward-org/scala-steward-action" target="_blank" rel="noopener">GitHub Action</a> that lowers the bar to adding Scala Steward support to projects via the GitHub Actions execution model.</p>
<p>By default, the Action supports dependency detection through a workflow defined inside of your project’s repository. This approach makes it easy to simulate the public instance of Scala Steward on a per repository basis. <em>But</em>, there is also a <a href="https://github.com/scala-steward-org/scala-steward-action#updating-multiple-repositories" target="_blank" rel="noopener">centralized mode</a> that allows you to mimic the way the centrally managed instance of Scala Steward works.</p>
<p>This centralized mode gives us an opportunity to have the best of both worlds: a low-effort way to keep multiple project dependencies up-to-date (similar to the public instance of Scala Steward), <em>and</em> the ability to do so across both public and private repositories!</p>
<h2 id="putting-things-together">Putting things together</h2>
<p>First, create a GitHub repository for your instance of Scala Steward and put a file in it at <code class="language-plaintext highlighter-rouge">.github/workflows/scala-steward.yml</code> with the following contents:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">name</span><span class="pi">:</span> <span class="s">Scala Steward</span>
<span class="na">on</span><span class="pi">:</span>
<span class="na">schedule</span><span class="pi">:</span>
<span class="c1"># Schedule to run every Sunday @ 12PM UTC. Replace this with</span>
<span class="c1"># whatever seems appropriate to you.</span>
<span class="pi">-</span> <span class="na">cron</span><span class="pi">:</span> <span class="s2">"</span><span class="s">0</span><span class="nv"> </span><span class="s">0</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">0"</span>
<span class="c1"># Provide support for manually triggering the workflow via GitHub.</span>
<span class="na">workflow_dispatch</span><span class="pi">:</span>
<span class="na">jobs</span><span class="pi">:</span>
<span class="na">scala-steward</span><span class="pi">:</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">scala-steward</span>
<span class="na">runs-on</span><span class="pi">:</span> <span class="s">ubuntu-latest</span>
<span class="na">steps</span><span class="pi">:</span>
<span class="c1"># This is necessary to ensure that the most up-to-date version of</span>
<span class="c1"># REPOSITORIES.md is used.</span>
<span class="pi">-</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">actions/checkout@v2</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Execute Scala Steward</span>
<span class="na">uses</span><span class="pi">:</span> <span class="s">scala-steward-org/scala-steward-action@vX.Y.Z</span>
<span class="na">with</span><span class="pi">:</span>
<span class="c1"># A GitHub personal access token tied to a user that will create</span>
<span class="c1"># pull requests against your projects to update dependencies. More</span>
<span class="c1"># on this under the YAML snippet.</span>
<span class="na">github-token</span><span class="pi">:</span> <span class="s">${{ secrets.SCALA_STEWARD_GITHUB_TOKEN }}</span>
<span class="c1"># A Markdown file with a literal Markdown list of repositories</span>
<span class="c1"># Scala Steward should monitor.</span>
<span class="na">repos-file</span><span class="pi">:</span> <span class="s">REPOSITORIES.md</span>
<span class="na">author-email</span><span class="pi">:</span> <span class="s">scala-steward@users.noreply.github.com</span>
<span class="na">author-name</span><span class="pi">:</span> <span class="s">Scala Steward</span>
</code></pre></div></div>
<p>Hopefully, the inline comments help minimize any ambiguity in the GitHub Actions workflow configuration file. For completeness, below is an example of the Markdown file as well:</p>
<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">-</span> organization/repository1
<span class="p">-</span> organization/repository2
<span class="p">-</span> organization/repository3
</code></pre></div></div>
<p>The last step is to ensure that any private repositories add the user associated with the GitHub personal access token as a collaborator with the <strong>Write</strong> role permissions. Also, to slightly improve usability and maintainability, consider the following suggestions:</p>
<ul>
<li>Add <a href="https://docs.github.com/en/free-pro-team@latest/github/administering-a-repository/keeping-your-actions-up-to-date-with-dependabot">Dependabot support</a> to your Scala Steward repository to keep the Scala Steward GitHub Action up-to-date.</li>
<li>Avoid tying Scala Steward to an individual user GitHub account. Consider creating a <a href="https://docs.github.com/en/free-pro-team@latest/github/getting-started-with-github/types-of-github-accounts#personal-user-accounts" target="_blank" rel="noopener">bot account</a> first, then create a personal access token with it to use with Scala Steward.</li>
<li>Create a custom Scala Steward team (e.g., <strong>@organization/scala-steward</strong>) and add the bot account above to it. Now, instead of remembering to add the bot account to your Scala project repository as a collaborator, you can add the more intuitive Scala Steward team.</li>
</ul>
A Useful Framework for Interpreting Success Stories2020-02-15T00:00:00-05:00https://hector.dev/2020/02/15/a-useful-framework-for-interpreting-success-stories<p>Recently, I had the pleasure of reading <a href="https://codahale.com/work-is-work/" target="_blank" rel="noopener">Work Is Work</a>, an essay by Coda Hale on organizational design. Aside from providing a thought-provoking perspective on scaling organizational efforts, the post makes reference to two terms from anthropological field research that were new to me: <a href="https://en.wikipedia.org/wiki/Emic_and_etic" target="_blank" rel="noopener">emic and etic</a>. Below, I’ll describe how these terms provide a useful framework for interpreting success stories.</p>
<h2 id="emic-and-etic">Emic and Etic</h2>
<p>When we read success stories, we often do so to help narrow down the solution space for a problem <em>we’re</em> facing. During that process, it can sometimes be easy to lose track of how important details of the story (its plot, setting, actors, etc.) are different from ours.</p>
<p>Emic and etic help describe behaviors or beliefs from the actor’s perspective (emic) vs. behavior or beliefs observed by an outsider (etic). Continuing with the success story example, writing about how I had great success with a new JavaScript framework is an <em>emic</em> account. You reading my story as research for selecting a JavaScript framework to use for your project is an <em>etic</em> account.</p>
<p>This framework has been valuable to me in two ways. It:</p>
<ol>
<li>Helps heighten my awareness; prompting an additional level of scrutiny toward the solutions I consider (e.g., you had success, but the project you used the JavaScript framework on was small and mine is large).</li>
<li>Provides shorthand terms for what are otherwise relatively difficult concepts to communicate.</li>
</ol>
Scheduling Lambda Functions with AWS SAM2018-06-14T00:00:00-04:00https://hector.dev/2018/06/14/scheduling-lambda-functions-with-aws-sam<p>A few days ago, I spent some time learning how to use Amazon’s <a href="https://github.com/awslabs/serverless-application-model" target="_blank" rel="noopener">Serverless Application Model (SAM)</a> to schedule the recurring execution of Lambda functions. To help better cement my understanding, I assembled an overview of all the SAM template components necessary to schedule the periodic execution of a Go-based Lambda function. I also made note of how I used the <a href="https://github.com/awslabs/aws-sam-cli" target="_blank" rel="noopener">SAM CLI</a> to package and deploy everything to AWS.</p>
<h2 id="serverless-application-model">Serverless Application Model</h2>
<p>Amazon’s Serverless Application Model is a specification for translating SAM templates into <a href="https://aws.amazon.com/cloudformation/" target="_blank" rel="noopener">CloudFormation</a> templates. Much like macro expansion, it works through a textual transformation of the input SAM template into a template the CloudFormation engine can make sense of.</p>
<p>There are several components that make up a SAM template, but in this example we only use four: <a href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/format-version-structure.html" target="_blank" rel="noopener">Format Version</a>, <a href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/template-description-structure.html" target="_blank" rel="noopener">Description</a>, <a href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/transform-section-structure.html" target="_blank" rel="noopener">Transform</a>, and <a href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/resources-section-structure.html" target="_blank" rel="noopener">Resources</a>.</p>
<ul>
<li><strong>Format Version</strong> equates to <code class="language-plaintext highlighter-rouge">AWSTemplateFormatVersion</code> in the template, which identifies its capabilities</li>
<li><strong>Description</strong> is optional, but provides a way to give the template a high-level description</li>
<li><strong>Transform</strong> can map to multiple things, but here it maps to the <code class="language-plaintext highlighter-rouge">AWS::Serverless-2016-10-31</code> transform, which is a version of the SAM specification</li>
</ul>
<p>As far as <strong>Resources</strong> go, this template defines two: <code class="language-plaintext highlighter-rouge">TestFunction</code> and <code class="language-plaintext highlighter-rouge">TestRole</code>.</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">AWSTemplateFormatVersion</span><span class="pi">:</span> <span class="s1">'</span><span class="s">2010-09-09'</span>
<span class="na">Description</span><span class="pi">:</span> <span class="s">A scheduled Amazon Lambda function.</span>
<span class="na">Resources</span><span class="pi">:</span>
<span class="na">TestFunction</span><span class="pi">:</span>
<span class="na">Properties</span><span class="pi">:</span>
<span class="na">CodeUri</span><span class="pi">:</span> <span class="s">.</span>
<span class="na">Events</span><span class="pi">:</span>
<span class="na">Testy</span><span class="pi">:</span>
<span class="na">Properties</span><span class="pi">:</span>
<span class="na">Schedule</span><span class="pi">:</span> <span class="s">rate(1 hour)</span>
<span class="na">Type</span><span class="pi">:</span> <span class="s">Schedule</span>
<span class="na">Handler</span><span class="pi">:</span> <span class="s">main</span>
<span class="na">Role</span><span class="pi">:</span> <span class="kt">!GetAtt</span> <span class="s">TestRole.Arn</span>
<span class="na">Runtime</span><span class="pi">:</span> <span class="s">go1.x</span>
<span class="na">Type</span><span class="pi">:</span> <span class="s">AWS::Serverless::Function</span>
<span class="na">TestRole</span><span class="pi">:</span>
<span class="na">Properties</span><span class="pi">:</span>
<span class="na">AssumeRolePolicyDocument</span><span class="pi">:</span>
<span class="na">Statement</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">Action</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">sts:AssumeRole</span>
<span class="na">Effect</span><span class="pi">:</span> <span class="s">Allow</span>
<span class="na">Principal</span><span class="pi">:</span>
<span class="na">Service</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">lambda.amazonaws.com</span>
<span class="na">Version</span><span class="pi">:</span> <span class="s1">'</span><span class="s">2012-10-17'</span>
<span class="na">ManagedPolicyArns</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole</span>
<span class="na">Type</span><span class="pi">:</span> <span class="s">AWS::IAM::Role</span>
<span class="na">Transform</span><span class="pi">:</span> <span class="s">AWS::Serverless-2016-10-31</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">TestRole</code> is a resource of type <code class="language-plaintext highlighter-rouge">AWS::IAM::Role</code>, which is a top-level CloudFormation resource. It creates an Identity and Access Management (IAM) role containing the permissions necessary for our Lambda function to do its thing. In this case, it simply encapsulates a canned IAM policy, <code class="language-plaintext highlighter-rouge">AWSLambdaBasicExecutionRole</code>. This policy allows the Lambda function to use the following CloudWatch API calls to log function output to CloudWatch Logs.</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">logs:CreateLogGroup</code></li>
<li><code class="language-plaintext highlighter-rouge">logs:CreateLogStream</code></li>
<li><code class="language-plaintext highlighter-rouge">logs:PutLogEvents</code></li>
</ul>
<p>The next resource, <code class="language-plaintext highlighter-rouge">TestFunction</code>, is of type <code class="language-plaintext highlighter-rouge">AWS::Serverless::Function</code>. This is <em>not</em> a top-level CloudFormation resource. Instead, it is a SAM resource that expands into multiple top-level CloudFormation resources. Based on our usage, it expands into three:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">AWS::Lambda::Function</code></li>
<li><code class="language-plaintext highlighter-rouge">AWS::Lambda::Permission</code></li>
<li><code class="language-plaintext highlighter-rouge">AWS::Events::Rule</code></li>
</ul>
<p><code class="language-plaintext highlighter-rouge">AWS::Lambda::Function</code> is the top-level CloudFormation resource to define an Amazon Lambda function. Because we want to schedule the function’s periodic execution, we include an <code class="language-plaintext highlighter-rouge">Events</code> property on our <code class="language-plaintext highlighter-rouge">AWS::Serverless::Function</code> resource. This allows us to define the function execution schedule <em>within</em> the context of the function’s properties. Behind-the-scenes, the <code class="language-plaintext highlighter-rouge">Events</code> property expands into a <code class="language-plaintext highlighter-rouge">AWS::Events::Rule</code> resource with an invocation rate of once per hour.</p>
<p>Lastly, in order for the CloudWatch Events API to invoke our function, it needs permissions to do so. <code class="language-plaintext highlighter-rouge">AWS::Lambda::Permission</code> grants CloudWatch Events the permission to invoke our function.</p>
<h2 id="package-and-ship">Package and ship</h2>
<p>The AWS SAM CLI builds on top of the SAM specification by providing a single tool to manage the packaging and deployment of serverless applications. <a href="https://github.com/awslabs/aws-sam-cli#installation" target="_blank" rel="noopener">Installation</a> is a bit out-of-scope for this post, but once you’ve managed to install the <code class="language-plaintext highlighter-rouge">sam</code> tool, the application deployment process occurs in three phases.</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">package</span> <span class="n">main</span>
<span class="k">import</span> <span class="p">(</span>
<span class="s">"context"</span>
<span class="s">"fmt"</span>
<span class="s">"github.com/aws/aws-lambda-go/events"</span>
<span class="s">"github.com/aws/aws-lambda-go/lambda"</span>
<span class="p">)</span>
<span class="k">func</span> <span class="n">HandleRequest</span><span class="p">(</span><span class="n">ctx</span> <span class="n">context</span><span class="o">.</span><span class="n">Context</span><span class="p">,</span> <span class="n">e</span> <span class="n">events</span><span class="o">.</span><span class="n">CloudWatchEvent</span><span class="p">)</span> <span class="p">(</span><span class="kt">string</span><span class="p">,</span> <span class="kt">error</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">fmt</span><span class="o">.</span><span class="n">Sprintf</span><span class="p">(</span><span class="s">"Hello, world."</span><span class="p">),</span> <span class="no">nil</span>
<span class="p">}</span>
<span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
<span class="n">lambda</span><span class="o">.</span><span class="n">Start</span><span class="p">(</span><span class="n">HandleRequest</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>
<p>First, compile your Go-based Lambda function into a Linux compatible binary.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">GOOS</span><span class="o">=</span>linux go build <span class="nt">-o</span> main main.go
</code></pre></div></div>
<p>Once the binary exists, use <code class="language-plaintext highlighter-rouge">sam</code> to upload the binary to S3 and reference it in a newly created <code class="language-plaintext highlighter-rouge">packaged.yaml</code> CloudFormation configuration.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>sam package <span class="nt">--s3-bucket</span> test-global-config-us-east-1 <span class="se">\</span>
<span class="nt">--template-file</span> template.yaml <span class="se">\</span>
<span class="nt">--output-template-file</span> packaged.yaml
Uploading to 7001c68762c2fcda61de373e0a30563d 29187040 / 29187040.0 <span class="o">(</span>100.00%<span class="o">)</span>
Successfully packaged artifacts and wrote output template to file packaged.yaml.
</code></pre></div></div>
<p>Before using <code class="language-plaintext highlighter-rouge">sam</code> to deploy using the contents of <code class="language-plaintext highlighter-rouge">packaged.yaml</code>, run a quick <code class="language-plaintext highlighter-rouge">diff</code> to see what changed.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>diff template.yaml packaged.yaml
< CodeUri: <span class="nb">.</span>
<span class="nt">---</span>
<span class="o">></span> CodeUri: s3://test-global-config-us-east-1/7001c68762c2fcda61de373e0a30563d
</code></pre></div></div>
<p>Lastly, use <code class="language-plaintext highlighter-rouge">sam</code> again to deploy the template through a CloudFormation stack named <code class="language-plaintext highlighter-rouge">Test</code>.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>sam deploy <span class="nt">--template-file</span> packaged.yaml <span class="se">\</span>
<span class="nt">--stack-name</span> Test <span class="se">\</span>
<span class="nt">--capabilities</span> CAPABILITY_IAM
Waiting <span class="k">for </span>changeset to be created..
Waiting <span class="k">for </span>stack create/update to <span class="nb">complete
</span>Successfully created/updated stack - Test
</code></pre></div></div>
<p>Within an hour or so (it only takes a few minutes to deploy—the wait is for the function schedule to trigger), you should see something like the following in your function’s CloudWatch Logs log stream.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>START RequestId: 5886a0f4-50a1-1cca-10b2-67f512fd83b1 Version: $LATEST
"Hello, world."
END RequestId: 5886a0f4-50a1-1cca-10b2-67f512fd83b1
REPORT RequestId: 5886a0f4-50a1-1cca-10b2-67f512fd83b1
Duration: 1.59 ms Billed Duration: 100 ms Memory Size: 128 MB Max Memory Used: 5 MB
</code></pre></div></div>
Haskell Code Katas: Counting Duplicates2017-12-17T00:00:00-05:00https://hector.dev/2017/12/17/haskell-code-katas-counting-duplicates<p>For the past few weeks, I’ve been starting off my days with Haskell flavored code katas from <a href="https://www.codewars.com/" target="_blank" rel="noopener">Codewars</a>. Today I started with the kata below and figured it would be a good exercise to walk through my solution.</p>
<blockquote>
<p>Write a function that will return the count of <em>distinct</em> case-insensitive alphabetic characters and numeric digits that occur more than once in the input string. The input string can be assumed to contain only alphabets (both uppercase and lowercase) and numeric digits.</p>
</blockquote>
<p>To help clarify the specifications for this kata, the <a href="https://hspec.github.io/" target="_blank" rel="noopener">Hspec</a> test suite is below:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">module</span> <span class="nn">Codwars.Kata.Duplicates.Test</span> <span class="kr">where</span>
<span class="kr">import</span> <span class="nn">Codwars.Kata.Duplicates</span> <span class="p">(</span><span class="nf">duplicateCount</span><span class="p">)</span>
<span class="kr">import</span> <span class="nn">Data.List</span> <span class="p">(</span><span class="nf">nub</span><span class="p">)</span>
<span class="kr">import</span> <span class="nn">Test.Hspec</span>
<span class="kr">import</span> <span class="nn">Test.QuickCheck</span>
<span class="n">main</span> <span class="o">=</span> <span class="n">hspec</span> <span class="o">$</span> <span class="kr">do</span>
<span class="n">describe</span> <span class="s">"duplicateCount"</span> <span class="o">$</span> <span class="kr">do</span>
<span class="n">it</span> <span class="s">"should work for some small tests"</span> <span class="o">$</span> <span class="kr">do</span>
<span class="n">duplicateCount</span> <span class="s">""</span> <span class="o">=?=</span> <span class="mi">0</span>
<span class="n">duplicateCount</span> <span class="s">"abcde"</span> <span class="o">=?=</span> <span class="mi">0</span>
<span class="n">duplicateCount</span> <span class="s">"aabbcde"</span> <span class="o">=?=</span> <span class="mi">2</span>
<span class="n">duplicateCount</span> <span class="s">"aaBbcde"</span> <span class="o">=?=</span> <span class="mi">2</span>
<span class="n">duplicateCount</span> <span class="s">"Indivisibility"</span> <span class="o">=?=</span> <span class="mi">1</span>
<span class="n">duplicateCount</span> <span class="s">"Indivisibilities"</span> <span class="o">=?=</span> <span class="mi">2</span>
<span class="n">duplicateCount</span> <span class="p">[</span><span class="sc">'a'</span><span class="o">..</span><span class="sc">'z'</span><span class="p">]</span> <span class="o">=?=</span> <span class="mi">0</span>
<span class="n">duplicateCount</span> <span class="p">([</span><span class="sc">'a'</span><span class="o">..</span><span class="sc">'z'</span><span class="p">]</span> <span class="o">++</span> <span class="p">[</span><span class="sc">'A'</span><span class="o">..</span><span class="sc">'Z'</span><span class="p">])</span> <span class="o">=?=</span> <span class="mi">26</span>
<span class="n">it</span> <span class="s">"should work for some random lists"</span> <span class="o">$</span> <span class="kr">do</span>
<span class="n">property</span> <span class="o">$</span> <span class="n">forAll</span> <span class="p">(</span><span class="n">listOf</span> <span class="o">$</span> <span class="n">elements</span> <span class="p">[</span><span class="sc">'a'</span><span class="o">..</span><span class="sc">'z'</span><span class="p">])</span> <span class="o">$</span> <span class="nf">\</span><span class="n">x</span> <span class="o">-></span>
<span class="kr">let</span> <span class="n">xs</span> <span class="o">=</span> <span class="n">nub</span> <span class="n">x</span>
<span class="kr">in</span> <span class="n">duplicateCount</span> <span class="p">(</span><span class="n">concatMap</span> <span class="p">(</span><span class="n">replicate</span> <span class="mi">2</span><span class="p">)</span> <span class="n">xs</span><span class="p">)</span> <span class="o">=?=</span> <span class="n">length</span> <span class="n">xs</span>
<span class="kr">where</span> <span class="p">(</span><span class="o">=?=</span><span class="p">)</span> <span class="o">=</span> <span class="n">shouldBe</span>
</code></pre></div></div>
<h2 id="sorting--grouping">Sorting & Grouping</h2>
<p>To start things off, we are given the following snippet:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">module</span> <span class="nn">Codwars.Kata.Duplicates</span> <span class="kr">where</span>
<span class="n">duplicateCount</span> <span class="o">::</span> <span class="kt">String</span> <span class="o">-></span> <span class="kt">Int</span>
<span class="n">duplicateCount</span> <span class="o">=</span> <span class="n">undefined</span>
</code></pre></div></div>
<p>My first step is to figure out how to deal with case-insensitivity. Within <code class="language-plaintext highlighter-rouge">Data.Char</code> is <code class="language-plaintext highlighter-rouge">toLower</code>, which can be used to map over each character in the input <code class="language-plaintext highlighter-rouge">String</code>.</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">Prelude</span><span class="o">></span> <span class="n">x</span> <span class="o">=</span> <span class="s">"aaBbcde"</span>
<span class="kt">Prelude</span><span class="o">></span> <span class="n">x</span>
<span class="s">"aaBbcde"</span>
<span class="kt">Prelude</span><span class="o">></span> <span class="kr">import</span> <span class="nn">Data.Char</span>
<span class="kt">Prelude</span> <span class="kt">Data</span><span class="o">.</span><span class="kt">Char</span><span class="o">></span> <span class="o">:</span><span class="n">t</span> <span class="n">toLower</span>
<span class="n">toLower</span> <span class="o">::</span> <span class="kt">Char</span> <span class="o">-></span> <span class="kt">Char</span>
<span class="kt">Prelude</span> <span class="kt">Data</span><span class="o">.</span><span class="kt">Char</span><span class="o">></span> <span class="n">map</span> <span class="n">toLower</span> <span class="n">x</span>
<span class="s">"aabbcde"</span>
</code></pre></div></div>
<p>Next, I want to group like characters together. To do that, I need to <code class="language-plaintext highlighter-rouge">sort</code> and then <code class="language-plaintext highlighter-rouge">group</code> the characters together.</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">Prelude</span> <span class="kt">Data</span><span class="o">.</span><span class="kt">Char</span><span class="o">></span> <span class="kr">import</span> <span class="nn">Data.List</span>
<span class="kt">Prelude</span> <span class="kt">Data</span><span class="o">.</span><span class="kt">Char</span> <span class="kt">Data</span><span class="o">.</span><span class="kt">List</span><span class="o">></span> <span class="o">:</span><span class="n">t</span> <span class="n">sort</span>
<span class="n">sort</span> <span class="o">::</span> <span class="kt">Ord</span> <span class="n">a</span> <span class="o">=></span> <span class="p">[</span><span class="n">a</span><span class="p">]</span> <span class="o">-></span> <span class="p">[</span><span class="n">a</span><span class="p">]</span>
<span class="kt">Prelude</span> <span class="kt">Data</span><span class="o">.</span><span class="kt">Char</span> <span class="kt">Data</span><span class="o">.</span><span class="kt">List</span><span class="o">></span> <span class="n">sort</span> <span class="o">.</span> <span class="n">map</span> <span class="n">toLower</span> <span class="o">$</span> <span class="n">x</span>
<span class="s">"aabbcde"</span>
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">sort</code> doesn’t do very much in this case because the input string was already sorted. Either way, now we can work on grouping like characters with <code class="language-plaintext highlighter-rouge">group</code>:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">Prelude</span> <span class="kt">Data</span><span class="o">.</span><span class="kt">Char</span> <span class="kt">Data</span><span class="o">.</span><span class="kt">List</span><span class="o">></span> <span class="o">:</span><span class="n">t</span> <span class="n">group</span>
<span class="n">group</span> <span class="o">::</span> <span class="kt">Eq</span> <span class="n">a</span> <span class="o">=></span> <span class="p">[</span><span class="n">a</span><span class="p">]</span> <span class="o">-></span> <span class="p">[[</span><span class="n">a</span><span class="p">]]</span>
<span class="kt">Prelude</span> <span class="kt">Data</span><span class="o">.</span><span class="kt">Char</span> <span class="kt">Data</span><span class="o">.</span><span class="kt">List</span><span class="o">></span> <span class="n">group</span> <span class="o">.</span> <span class="n">sort</span> <span class="o">.</span> <span class="n">map</span> <span class="n">toLower</span> <span class="o">$</span> <span class="n">x</span>
<span class="p">[</span><span class="s">"aa"</span><span class="p">,</span><span class="s">"bb"</span><span class="p">,</span><span class="s">"c"</span><span class="p">,</span><span class="s">"d"</span><span class="p">,</span><span class="s">"e"</span><span class="p">]</span>
</code></pre></div></div>
<h2 id="home-stretch">Home Stretch</h2>
<p>Now, how do we go from a list of <code class="language-plaintext highlighter-rouge">[Char]</code> to an <code class="language-plaintext highlighter-rouge">Int</code> length that can be used for filtering characters that only occur once? <code class="language-plaintext highlighter-rouge">filter</code>, with a <code class="language-plaintext highlighter-rouge">>1</code> condition applied to the <code class="language-plaintext highlighter-rouge">length</code>, should get us there.</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">Prelude</span> <span class="kt">Data</span><span class="o">.</span><span class="kt">Char</span> <span class="kt">Data</span><span class="o">.</span><span class="kt">List</span><span class="o">></span> <span class="n">z</span> <span class="o">=</span> <span class="n">group</span> <span class="o">.</span> <span class="n">sort</span> <span class="o">.</span> <span class="n">map</span> <span class="n">toLower</span> <span class="o">$</span> <span class="n">x</span>
<span class="kt">Prelude</span> <span class="kt">Data</span><span class="o">.</span><span class="kt">Char</span> <span class="kt">Data</span><span class="o">.</span><span class="kt">List</span><span class="o">></span> <span class="n">filter</span> <span class="p">((</span><span class="o">></span><span class="mi">1</span><span class="p">)</span> <span class="o">.</span> <span class="n">length</span><span class="p">)</span> <span class="n">z</span>
<span class="p">[</span><span class="s">"aa"</span><span class="p">,</span><span class="s">"bb"</span><span class="p">]</span>
</code></pre></div></div>
<p>Here, the <code class="language-plaintext highlighter-rouge">.</code> allows us to compose <code class="language-plaintext highlighter-rouge">length</code> and <code class="language-plaintext highlighter-rouge">>1</code> together so that both can be applied to the <code class="language-plaintext highlighter-rouge">[Char]</code> provided to <code class="language-plaintext highlighter-rouge">filter</code>. The result rids the list of any characters that only occur once in the original input.</p>
<p>Lastly, we need the count of distinct characters from the input <code class="language-plaintext highlighter-rouge">String</code> that occur more than one, which is as simple as getting the <code class="language-plaintext highlighter-rouge">length</code> of the filtered list.</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">Prelude</span> <span class="kt">Data</span><span class="o">.</span><span class="kt">Char</span> <span class="kt">Data</span><span class="o">.</span><span class="kt">List</span><span class="o">></span> <span class="n">f</span> <span class="o">=</span> <span class="n">filter</span> <span class="p">((</span><span class="o">></span><span class="mi">1</span><span class="p">)</span> <span class="o">.</span> <span class="n">length</span><span class="p">)</span> <span class="n">z</span>
<span class="kt">Prelude</span> <span class="kt">Data</span><span class="o">.</span><span class="kt">Char</span> <span class="kt">Data</span><span class="o">.</span><span class="kt">List</span><span class="o">></span> <span class="n">length</span> <span class="n">f</span>
<span class="mi">2</span>
</code></pre></div></div>
<p>Putting it all together, and breaking out some of the pipelined functions into a variable in the <code class="language-plaintext highlighter-rouge">where</code> clause, we get the <code class="language-plaintext highlighter-rouge">duplicateCount</code> function below.</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">module</span> <span class="nn">Codwars.Kata.Duplicates</span> <span class="kr">where</span>
<span class="kr">import</span> <span class="nn">Data.List</span> <span class="p">(</span><span class="nf">group</span><span class="p">,</span> <span class="nf">sort</span><span class="p">)</span>
<span class="kr">import</span> <span class="nn">Data.Char</span> <span class="p">(</span><span class="nf">toLower</span><span class="p">)</span>
<span class="n">duplicateCount</span> <span class="o">::</span> <span class="kt">String</span> <span class="o">-></span> <span class="kt">Int</span>
<span class="n">duplicateCount</span> <span class="o">=</span> <span class="n">length</span> <span class="o">.</span> <span class="n">filter</span> <span class="p">((</span><span class="o">></span><span class="mi">1</span><span class="p">)</span> <span class="o">.</span> <span class="n">length</span><span class="p">)</span> <span class="o">.</span> <span class="n">grouped</span>
<span class="kr">where</span>
<span class="n">grouped</span> <span class="o">=</span> <span class="n">group</span> <span class="o">.</span> <span class="n">sort</span> <span class="o">.</span> <span class="n">map</span> <span class="n">toLower</span>
</code></pre></div></div>
Installing Tor on FreeBSD 112016-11-12T12:00:00-05:00https://hector.dev/2016/11/12/installing-tor-on-freebsd-11<p>Tor is a piece of free software and an open network that enables anonymous communication. Combined, these two components help defend against various forms of traffic analysis and network surveillance. Trying to re-explain Tor in a comprehensive way is outside the scope of this post, but please read about it via the literature provided by the <a href="https://www.torproject.org/" target="_blank" rel="noopener">project site</a> and <a href="https://www.eff.org/pages/tor-and-https" target="_blank" rel="noopener">The Electronic Frontier Foundation (EFF)</a> before installing.</p>
<h2 id="installation">Installation</h2>
<p>The first step toward installing Tor on FreeBSD is deciding whether you want to install the precompiled package with <code class="language-plaintext highlighter-rouge">pkg</code>, or you want to compile it yourself from the <a href="https://www.freebsd.org/ports/index.html" target="_blank" rel="noopener">FreeBSD Ports Collection</a>. The tradeoffs between these two approaches are well-explained within the <a href="https://www.freebsd.org/doc/handbook/ports-overview.html" target="_blank" rel="noopener">FreeBSD Handbook</a>. I chose the package because customizing the installation configuration beyond the defaults didn’t seem necessary.</p>
<p>With all of that said, from inside a <code class="language-plaintext highlighter-rouge">root</code> shell install the Tor package with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># pkg install tor
</code></pre></div></div>
<h2 id="configuration">Configuration</h2>
<p>From there, copy the sample Tor configuration file into its default location and open it inside your editor:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># cp /usr/local/etc/tor/torrc.sample /usr/local/etc/tor/torrc
# vim /usr/local/etc/tor/torrc
</code></pre></div></div>
<p>Once inside the file, there are three settings that we want to make explicit. All should be commented out by default (<code class="language-plaintext highlighter-rouge">SOCKSPort</code>,<code class="language-plaintext highlighter-rouge">Log</code>, and <code class="language-plaintext highlighter-rouge">Log</code> again), so we simply need to uncomment them. Below is a <code class="language-plaintext highlighter-rouge">diff</code> of the changes between the sample and our desired configuration file:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">18c18
</span><span class="gd">< SOCKSPort 9050
</span><span class="p">---
</span><span class="gi">> #SOCKSPort 9050 # Default: Bind to localhost:9050 for local connections.
</span><span class="p">38c38
</span><span class="gd">< Log notice file /var/log/tor/notices.log
</span><span class="p">---
</span><span class="gi">> #Log notice file /var/log/tor/notices.log
</span><span class="p">42c42
</span><span class="gd">< Log notice syslog
</span><span class="p">---
</span><span class="gi">> #Log notice syslog
</span></code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">SOCKSPort</code> setting ensures that we’re binding Tor to <code class="language-plaintext highlighter-rouge">127.0.0.1</code> on its default port of <code class="language-plaintext highlighter-rouge">9050</code>. The two <code class="language-plaintext highlighter-rouge">Log</code> settings ensure that <code class="language-plaintext highlighter-rouge">notice</code> level log messages are written to a specific log file, as well as <code class="language-plaintext highlighter-rouge">syslog</code>.</p>
<p>Now, we can launch Tor using the <code class="language-plaintext highlighter-rouge">tor</code> command to see if things are working properly:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>% tor
<span class="o">[</span>notice] Tor v0.2.8.9 running on FreeBSD with Libevent 2.0.22-stable, OpenSSL 1.0.2j-freebsd and Zlib 1.2.8.
<span class="o">[</span>notice] Tor cant <span class="nb">help </span>you <span class="k">if </span>you use it wrong! Learn how to be safe at https://www.torproject.org/download/download#warning
<span class="o">[</span>notice] Read configuration file <span class="s2">"/usr/local/etc/tor/torrc"</span><span class="nb">.</span>
<span class="o">[</span>notice] Opening Socks listener on 127.0.0.1:9050
<span class="o">[</span>notice] Parsing GEOIP IPv4 file /usr/local/share/tor/geoip.
<span class="o">[</span>notice] Parsing GEOIP IPv6 file /usr/local/share/tor/geoip6.
<span class="o">[</span>notice] Bootstrapped 0%: Starting
<span class="o">[</span>notice] Bootstrapped 80%: Connecting to the Tor network
<span class="o">[</span>notice] Bootstrapped 85%: Finishing handshake with first hop
<span class="o">[</span>notice] Bootstrapped 90%: Establishing a Tor circuit
<span class="o">[</span>notice] Tor has successfully opened a circuit. Looks like client functionality is working.
<span class="o">[</span>notice] Bootstrapped 100%: Done
</code></pre></div></div>
<p>Once satisfied, <code class="language-plaintext highlighter-rouge">CTRL+C</code> the process so that control is returned to your shell.</p>
<p>Lastly, let’s enable the Tor service so that it starts on its own after the system boots. To achieve that, all we have to do is ensure that <code class="language-plaintext highlighter-rouge">/etc/rc.conf</code> contains the following line:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">tor_enable</span><span class="o">=</span><span class="s2">"YES"</span>
</code></pre></div></div>
<p>Afterwards, launch the Tor service through the service manager if you want it running prior to the next boot cycle:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># service tor start
</code></pre></div></div>
<p>That’s it. You should now have a fully functional installation of Tor running on FreeBSD.</p>
Raft Leader Election in Consul2015-08-13T13:00:00-04:00https://hector.dev/2015/08/13/raft-leader-election-in-consul<p>A small paper reading group has assembled at work. We give ourselves two to three weeks to read a paper, meetup after hours, eat pizza, and discuss it. Our last paper focused on the <a href="https://raftconsensus.github.io/" target="_blank" rel="noopener">Raft</a> consensus algorithm, and I was chosen to lead the discussion.</p>
<p>In order to help the impact of Raft hit closer to home, I put together a <a href="https://github.com/hectcastro/raft-consul-demo" target="_blank" rel="noopener">small demo</a> of Raft’s leader election process using <a href="https://www.consul.io/" target="_blank" rel="noopener">Consul</a>. The demo spins up a three node Consul cluster using containers, then interleaves all of the debug log output filtered with <code class="language-plaintext highlighter-rouge">grep</code> for <code class="language-plaintext highlighter-rouge">raft</code>. Reading through parts of the Raft paper, you can see how the logging output of <a href="https://github.com/hashicorp/raft" target="_blank" rel="noopener">HashiCorp’s implementation</a> lines up.</p>
<h2 id="reading-along">Reading Along</h2>
<p>Section 5.2 of the Raft paper focuses on leader election, and starts off with:</p>
<blockquote>
<p>When servers start up, they begin as followers.</p>
</blockquote>
<p>Sure enough, the first <code class="language-plaintext highlighter-rouge">raft</code> filtered logs start with:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>docker-compose up | <span class="nb">grep </span>raft
consul1 | <span class="o">[</span>INFO] raft: Node at 172.17.0.45:8300 <span class="o">[</span>Follower] entering Follower state
consul2 | <span class="o">[</span>INFO] raft: Node at 172.17.0.44:8300 <span class="o">[</span>Follower] entering Follower state
consul3 | <span class="o">[</span>INFO] raft: Node at 172.17.0.43:8300 <span class="o">[</span>Follower] entering Follower state
</code></pre></div></div>
<p>Next is the the beginning of an election:</p>
<blockquote>
<p>If a follower receives no communication over a period of time called the election timeout, then it assumes there is no viable leader and begins an election to choose a new leader.</p>
</blockquote>
<p>That corresponds with:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>consul1 | <span class="o">[</span>WARN] raft: Heartbeat <span class="nb">timeout </span>reached, starting election
</code></pre></div></div>
<p>Now that the election started, there needs to be a winner:</p>
<blockquote>
<p>A candidate wins an election if it receives votes from a majority of the servers in the full cluster for the same term.</p>
</blockquote>
<p>Which goes with:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>consul1 | <span class="o">[</span>DEBUG] raft: Votes needed: 2
consul1 | <span class="o">[</span>DEBUG] raft: Vote granted. Tally: 1
consul1 | <span class="o">[</span>DEBUG] raft: Vote granted. Tally: 2
consul1 | <span class="o">[</span>INFO] raft: Election won. Tally: 2
consul1 | <span class="o">[</span>INFO] raft: Node at 172.17.0.45:8300 <span class="o">[</span>Leader] entering Leader state
</code></pre></div></div>
<p>Lastly, <code class="language-plaintext highlighter-rouge">AppendEntries</code> is used to communicate the new leader to all other candidates:</p>
<blockquote>
<p>While waiting for votes, a candidate may receive an AppendEntries RPC from another server claiming to be leader.</p>
</blockquote>
<p>Logs from <code class="language-plaintext highlighter-rouge">consul1</code> show that it is replicating to <code class="language-plaintext highlighter-rouge">consul2</code> and <code class="language-plaintext highlighter-rouge">consul3</code>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>consul1 | <span class="o">[</span>INFO] raft: pipelining replication to peer 172.17.0.44:8300
consul1 | <span class="o">[</span>INFO] raft: pipelining replication to peer 172.17.0.43:8300
</code></pre></div></div>
Updating the Amazon RDS Certificate Bundle2015-03-14T13:00:00-04:00https://hector.dev/2015/03/14/updating-the-amazon-rds-certificate-bundle<p>On March 23rd, 2015 20:00 UTC, Amazon plans to update the SSL certificate for <a href="https://aws.amazon.com/rds/" target="_blank" rel="noopener">RDS</a> instances. This means that applications attempting to establish secure connections to Amazon RDS databases from servers without an updated RDS certificate bundle may begin to fail. In order to prevent connection failures to Amazon RDS databases, an updated certificate bundle can be installed on client servers in advance.</p>
<h2 id="test-connections-to-amazon-rds">Test Connections to Amazon RDS</h2>
<p>First, I recommend starting a new Amazon RDS database with the <code class="language-plaintext highlighter-rouge">rds-ca-2015</code> certificate authority configured. For this example, I’m going to use a PostgreSQL Amazon RDS database.</p>
<p>Using the <code class="language-plaintext highlighter-rouge">psql</code> command, execute the following steps from a server intended to communicate securely with Amazon RDS:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">PGSSLROOTCERT</span><span class="o">=</span><span class="s2">"/etc/ssl/certs/ca-certificates.crt"</span>
<span class="nb">export </span><span class="nv">PGSSLMODE</span><span class="o">=</span><span class="s2">"verify-full"</span>
psql <span class="nt">-h</span> test.cvg4pxyrtpes.us-east-1.rds.amazonaws.com <span class="nt">-U</span> <span class="nb">test</span>
</code></pre></div></div>
<p>If you are met with the following message, then you need to install the updated certificate bundle:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>psql: SSL error: certificate verify failed
</code></pre></div></div>
<h2 id="updating-the-certificate-bundle">Updating the Certificate Bundle</h2>
<p>On a Ubuntu server, the <code class="language-plaintext highlighter-rouge">update-ca-certificates</code> command can be used to update the local CA certificates. First, we need to download the updated Amazon RDS combined CA bundle, then we need to put it in a place where <code class="language-plaintext highlighter-rouge">update-ca-certificates</code> knows to pick it up:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>wget http://s3.amazonaws.com/rds-downloads/rds-combined-ca-bundle.pem
<span class="nv">$ </span><span class="nb">sudo mv </span>rds-combined-ca-bundle.pem <span class="se">\</span>
/usr/local/share/ca-certificates/rds-combined-ca-bundle.crt
<span class="nv">$ </span><span class="nb">sudo </span>update-ca-certificates
</code></pre></div></div>
<p><strong>Note</strong>: The file extension for <code class="language-plaintext highlighter-rouge">rds-combined-ca-bundle</code> changes from <code class="language-plaintext highlighter-rouge">.pem</code> to <code class="language-plaintext highlighter-rouge">.crt</code>.</p>
<p>Now, if we run the test above once more on the same machine, you should be met with a password prompt, and a successfully established secure connection to the Amazon RDS PostgreSQL database.</p>
<p>Lastly, if you use Ansible for configuration management, take a look at the <a href="https://github.com/azavea/ansible-rds-ca-bundle" target="_blank" rel="noopener">azavea.rds-ca-bundle</a> role to help automate updating the Amazon RDS certificate bundle on client servers.</p>
Preparing EC2 Instance Store with cloud-init2015-01-24T12:00:00-05:00https://hector.dev/2015/01/24/preparing-ec2-instance-store-with-cloud-init<p>Most Amazon Machine Images (AMIs) are backed by an Elastic Block Store (EBS) volume. This volume houses the operating system and any additional software added to the machine image. When you launch an instance of an EBS backed AMI, the resulting EC2 instance <a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html#StorageOnInstanceTypes" target="_blank" rel="noopener">usually</a> includes some amount of <a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html" target="_blank" rel="noopener">instance store</a> storage as well. Instance store is fast (relative to EBS), but also temporary, and physically attached to the virtual machine host.</p>
<h2 id="unprepared-instance-store">Unprepared Instance Store</h2>
<p>Instance store is associated with an EC2 instance via a <a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/block-device-mapping-concepts.html" target="_blank" rel="noopener">block device mapping</a>. Usually, instance store mappings carry a virtual device name of <code class="language-plaintext highlighter-rouge">ephemeral0</code> to <code class="language-plaintext highlighter-rouge">ephemeralN</code> and are pre-formatted as <code class="language-plaintext highlighter-rouge">ext3</code>. Unfortunately, no formatted <code class="language-plaintext highlighter-rouge">ext3</code> file system exists if you’re using SSD-based instance store with <a href="https://en.wikipedia.org/wiki/Trim_(computing)" target="_blank" rel="noopener">TRIM</a> support (only <code class="language-plaintext highlighter-rouge">r3.*</code> and <code class="language-plaintext highlighter-rouge">i2.*</code> instances right now).</p>
<p>If you’re dealing with instance store that’s not pre-formatted, or you want to use a filesystem other than <code class="language-plaintext highlighter-rouge">ext3</code>, how do you remedy that elegantly inside of EC2? One possible answer is a set of <code class="language-plaintext highlighter-rouge">cloud-init</code> directives via EC2 user data.</p>
<h2 id="user-data-and-cloud-init">User Data and <code class="language-plaintext highlighter-rouge">cloud-init</code></h2>
<p>Before launching an EC2 instance, you can provide it with a bit of <a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html" target="_blank" rel="noopener">user data</a>. User data can either be a shell script or a set of <code class="language-plaintext highlighter-rouge">cloud-init</code> directives.</p>
<p>Using the <code class="language-plaintext highlighter-rouge">fs_setup</code> <code class="language-plaintext highlighter-rouge">cloud-init</code> module, formatting a pair of SSD volumes looks something like:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">fs_setup</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">label</span><span class="pi">:</span> <span class="s">ephemeral0,</span>
<span class="na">filesystem</span><span class="pi">:</span> <span class="s">ext3</span>
<span class="na">extra_opts</span><span class="pi">:</span> <span class="pi">[</span> <span class="s2">"</span><span class="s">-E"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">nodiscard"</span> <span class="pi">]</span>
<span class="na">device</span><span class="pi">:</span> <span class="s">ephemeral0</span>
<span class="na">partition</span><span class="pi">:</span> <span class="s">auto</span>
<span class="pi">-</span> <span class="na">label</span><span class="pi">:</span> <span class="s">ephemeral1,</span>
<span class="na">filesystem</span><span class="pi">:</span> <span class="s">ext3</span>
<span class="na">extra_opts</span><span class="pi">:</span> <span class="pi">[</span> <span class="s2">"</span><span class="s">-E"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">nodiscard"</span> <span class="pi">]</span>
<span class="na">device</span><span class="pi">:</span> <span class="s">ephemeral1</span>
<span class="na">partition</span><span class="pi">:</span> <span class="s">auto</span>
</code></pre></div></div>
<p>After the volumes are formatted, you probably also want to mount them somewhere. The <code class="language-plaintext highlighter-rouge">mounts</code> module can handle that:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">mounts</span><span class="pi">:</span>
<span class="pi">-</span> <span class="pi">[</span> <span class="nv">ephemeral0</span><span class="pi">,</span> <span class="nv">null</span> <span class="pi">]</span> <span class="c1"># Override any default EC2 mounting behavior</span>
<span class="pi">-</span> <span class="pi">[</span> <span class="nv">ephemeral1</span><span class="pi">,</span> <span class="nv">null</span> <span class="pi">]</span> <span class="c1"># Override any default EC2 mounting behavior</span>
<span class="pi">-</span> <span class="pi">[</span> <span class="nv">ephemeral0</span><span class="pi">,</span> <span class="s2">"</span><span class="s">/media/ephemeral0"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">ext3"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">defaults,nobootwait,discard"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">0"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">2"</span> <span class="pi">]</span>
<span class="pi">-</span> <span class="pi">[</span> <span class="nv">ephemeral1</span><span class="pi">,</span> <span class="s2">"</span><span class="s">/media/ephemeral1"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">ext3"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">defaults,nobootwait,discard"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">0"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">2"</span> <span class="pi">]</span>
</code></pre></div></div>
<p>Lastly, we can change the user and group for these mounts with <code class="language-plaintext highlighter-rouge">runcmd</code> so that users other than <code class="language-plaintext highlighter-rouge">root</code> (here I’m using <code class="language-plaintext highlighter-rouge">hdfs</code>) can read and write to them:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">runcmd</span><span class="pi">:</span>
<span class="pi">-</span> <span class="pi">[</span> <span class="nv">chown</span><span class="pi">,</span> <span class="nv">hdfs</span><span class="pi">,</span> <span class="s2">"</span><span class="s">/media/ephemeral0"</span> <span class="pi">]</span>
<span class="pi">-</span> <span class="pi">[</span> <span class="nv">chgrp</span><span class="pi">,</span> <span class="nv">hdfs</span><span class="pi">,</span> <span class="s2">"</span><span class="s">/media/ephemeral0"</span> <span class="pi">]</span>
<span class="pi">-</span> <span class="pi">[</span> <span class="nv">chown</span><span class="pi">,</span> <span class="nv">hdfs</span><span class="pi">,</span> <span class="s2">"</span><span class="s">/media/ephemeral1"</span> <span class="pi">]</span>
<span class="pi">-</span> <span class="pi">[</span> <span class="nv">chgrp</span><span class="pi">,</span> <span class="nv">hdfs</span><span class="pi">,</span> <span class="s2">"</span><span class="s">/media/ephemeral1"</span> <span class="pi">]</span>
</code></pre></div></div>
<p>After putting all of these snippets together inside of a <code class="language-plaintext highlighter-rouge">.yml</code> file with <code class="language-plaintext highlighter-rouge">#cloud-config</code> at the top, it’s ready to be fed through the launch process of new EC2 instances via user data. In the end, hopefully producing a few nicely formatted and mounted volumes of instance store.</p>
Sending E-Mail via Amazon SES over SMTP with IAM Roles2015-01-17T12:00:00-05:00https://hector.dev/2015/01/17/sending-e-mail-via-amazon-ses-over-smtp-with-iam-roles<p><strong>TL;DR</strong>: As of the date this post was published, sending e-mail via <a href="https://aws.amazon.com/ses/" target="_blank" rel="noopener">Amazon Simple E-mail Service (SES)</a> over SMTP with IAM role credentials does not seem to work.</p>
<hr />
<p>Earlier this week, I set out to wire up a Django application with Amazon SES for sending e-mail. Because the application is going to live in Amazon Elastic Compute Cloud (EC2), I decided to make use of <a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html" target="_blank" rel="noopener">IAM roles</a> to provide the application with the credentials it needs to authenticate with SES. Unfortunately, the SMTP endpoint does not seem to accept the IAM role credentials.</p>
<h2 id="iam-roles">IAM Roles</h2>
<p>IAM roles are an elegant way to setup an EC2 instance for API access to other Amazon Web Services. All API requests must be signed with an access key and secret key, so it is usually up to you to populate the EC2 instance with the proper credentials. However, if you make use of IAM roles, an automatically rotated set of keys is provided to the instance via its <a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html#instancedata-data-retrieval" target="_blank" rel="noopener">metadata service</a>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>curl http://169.254.169.254/latest/meta-data/iam/security-credentials/s3access
<span class="o">{</span>
<span class="s2">"Code"</span> : <span class="s2">"Success"</span>,
<span class="s2">"LastUpdated"</span> : <span class="s2">"2012-04-26T16:39:16Z"</span>,
<span class="s2">"Type"</span> : <span class="s2">"AWS-HMAC"</span>,
<span class="s2">"AccessKeyId"</span> : <span class="s2">"AKIAIOSFODNN7EXAMPLE"</span>,
<span class="s2">"SecretAccessKey"</span> : <span class="s2">"wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"</span>,
<span class="s2">"Token"</span> : <span class="s2">"token"</span>,
<span class="s2">"Expiration"</span> : <span class="s2">"2012-04-27T22:39:16Z"</span>
<span class="o">}</span>
</code></pre></div></div>
<h2 id="deriving-ses-smtp-credentials">Deriving SES SMTP Credentials</h2>
<p>Once your application retrieves a set of keys from the metadata service, <code class="language-plaintext highlighter-rouge">SecretAccessKey</code> needs to go through a <a href="https://docs.aws.amazon.com/ses/latest/DeveloperGuide/smtp-credentials.html#smtp-credentials-convert" target="_blank" rel="noopener">little bit of a transformation</a> before it can be used with the SES SMTP endpoint. Amazon’s pseudocode for the transformation algorithm follows:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">key</span> <span class="o">=</span> <span class="no">AWS</span> <span class="nc">Secret</span> <span class="nc">Access</span> <span class="nc">Key</span><span class="o">;</span>
<span class="n">message</span> <span class="o">=</span> <span class="s">"SendRawEmail"</span><span class="o">;</span>
<span class="n">versionInBytes</span> <span class="o">=</span> <span class="mh">0x02</span><span class="o">;</span>
<span class="n">signatureInBytes</span> <span class="o">=</span> <span class="nc">HmacSha256</span><span class="o">(</span><span class="n">message</span><span class="o">,</span> <span class="n">key</span><span class="o">);</span>
<span class="n">signatureAndVer</span> <span class="o">=</span> <span class="nc">Concatenate</span><span class="o">(</span><span class="n">versionInBytes</span><span class="o">,</span> <span class="n">signatureInBytes</span><span class="o">);</span>
<span class="n">smtpPassword</span> <span class="o">=</span> <span class="nc">Base64</span><span class="o">(</span><span class="n">signatureAndVer</span><span class="o">);</span>
</code></pre></div></div>
<p>And for good measure, a translation of that into Python (for use with Django):</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">SES_SMTP_CONVERSION_HMAC_MESSAGE</span> <span class="o">=</span> <span class="sh">'</span><span class="s">SendRawEmail</span><span class="sh">'</span>
<span class="n">SES_SMTP_CONVERSION_VERSION</span> <span class="o">=</span> <span class="sh">'</span><span class="se">\x02</span><span class="sh">'</span>
<span class="k">def</span> <span class="nf">hash_smtp_pass_from_secret_key</span><span class="p">(</span><span class="n">key</span><span class="p">):</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">hmac</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">key</span><span class="p">.</span><span class="nf">encode</span><span class="p">(</span><span class="sh">'</span><span class="s">utf-8</span><span class="sh">'</span><span class="p">),</span>
<span class="n">SES_SMTP_CONVERSION_HMAC_MESSAGE</span><span class="p">,</span>
<span class="n">digestmod</span><span class="o">=</span><span class="n">hashlib</span><span class="p">.</span><span class="n">sha256</span><span class="p">)</span>
<span class="k">return</span> <span class="n">base64</span><span class="p">.</span><span class="nf">b64encode</span><span class="p">(</span><span class="sh">"</span><span class="s">{0}{1}</span><span class="sh">"</span><span class="p">.</span><span class="nf">format</span><span class="p">(</span><span class="n">SES_SMTP_CONVERSION_VERSION</span><span class="p">,</span>
<span class="n">h</span><span class="p">.</span><span class="nf">digest</span><span class="p">()))</span>
</code></pre></div></div>
<p>(Credit: <a href="https://github.com/clavery/" target="_blank" rel="noopener">Charles Lavery</a>, <a href="https://github.com/steventlamb" target="_blank" rel="noopener">Steve Lamb</a>)</p>
<h2 id="authentication-credentials-invalid">Authentication Credentials Invalid</h2>
<p>After launching an EC2 instance associated with an IAM role that allows <code class="language-plaintext highlighter-rouge">ses:SendEmail</code>, pulling credentials via the metadata service, and transforming the provided <code class="language-plaintext highlighter-rouge">SecretAccessKey</code>, you’ll notice that the SMTP endpoint still returns <code class="language-plaintext highlighter-rouge">535 Authentication Credentials Invalid</code>.</p>
<p>I tried several approaches to make things work, but always without success. I even compiled the <a href="https://docs.aws.amazon.com/ses/latest/DeveloperGuide/smtp-credentials.html#smtp-credentials-convert" target="_blank" rel="noopener">Java implementation</a> of the transformation algorithm provided by AWS to compare inputs and outputs. Alas, I simply don’t think IAM role credentials work with the SES SMTP endpoint.</p>
Using Docker to Manage Erlang Environments for Riak2014-07-11T13:00:00-04:00https://hector.dev/2014/07/11/using-docker-to-manage-erlang-environments-for-riak<p><a href="https://riak.com">Basho</a> packages their own fork of Erlang/OTP along with
<a href="https://docs.riak.com/riak/latest/" target="_blank" rel="noopener">Riak</a> and
<a href="https://docs.riak.com/riakcs/latest/" target="_blank" rel="noopener">Riak CS</a>. The forks are typically an
older version of a stable Erlang/OTP release with a few patches. Eventually,
all patches included in the Basho fork are merged into later versions of an
official Erlang/OTP release.</p>
<p>If you’re installing Riak and Riak CS from a package, then all of the hard
work that surrounds bundling a custom version of Erlang/OTP has been taken
care of for you. On the other hand, if you are installing Riak or Riak CS from
source, then you may want to install the forked version of Erlang/OTP as well.</p>
<h2 id="docker">Docker</h2>
<p>Docker gives us a nice way to setup an isolated environment for installing
Erlang/OTP and Riak. More specifically, the
<a href="https://registry.hub.docker.com/u/hectcastro/basho-otp/" target="_blank" rel="noopener">docker-basho-otp</a>
image makes the whole process one step simpler by starting you off with an
already built Basho fork of Erlang/OTP. As of this post, the latest custom
build of Erlang/OTP is <code class="language-plaintext highlighter-rouge">R16B02_basho5</code>. This version is meant to be paired
with Riak 2.0+.</p>
<p>First, we need to pull down the image that contains <code class="language-plaintext highlighter-rouge">R16B02_basho5</code>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker pull hectcastro/basho-otp
</code></pre></div></div>
<p>Next, we need to start a container and invoke <code class="language-plaintext highlighter-rouge">/bin/bash</code>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker run <span class="nt">-t</span> <span class="nt">-i</span> <span class="nt">--rm</span> hectcastro/basho-otp /bin/bash
</code></pre></div></div>
<p>Now, let’s test to make sure that the correct version of Erlang/OTP is
available:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>erl
Erlang R16B02-basho5 <span class="o">(</span>erts-5.10.3<span class="o">)</span> <span class="o">[</span><span class="nb">source</span><span class="o">]</span> <span class="o">[</span>64-bit] <span class="o">[</span>smp:4:4] <span class="o">[</span>async-threads:10] ...
Eshell V5.10.3 <span class="o">(</span>abort with ^G<span class="o">)</span>
1>
</code></pre></div></div>
<p>(<code class="language-plaintext highlighter-rouge">Control + C</code> and then <code class="language-plaintext highlighter-rouge">a</code> for abort gets you out of this shell.)</p>
<h2 id="riak">Riak</h2>
<p>Solid Erlang/OTP environment? Check.</p>
<p>Now we need to pull down the Riak 2.0 source code to build what’s referred to
as a <code class="language-plaintext highlighter-rouge">devrel</code>. A <code class="language-plaintext highlighter-rouge">devrel</code> (or development release) automates the creation of
<code class="language-plaintext highlighter-rouge">5</code> separate copies of Riak. After the <code class="language-plaintext highlighter-rouge">devrel</code> process is complete, you can
start each copy of Riak and join all of the instances into a cluster.</p>
<p>First, let’s clone the Riak repository and checkout the latest Riak 2.0 tag
(as of this post, the most recent tag is <code class="language-plaintext highlighter-rouge">riak-2.0.0rc1</code>):</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>git clone https://github.com/basho/riak.git
Cloning into <span class="s1">'riak'</span>...
remote: Reusing existing pack: 16251, <span class="k">done</span><span class="nb">.</span>
remote: Counting objects: 6, <span class="k">done</span><span class="nb">.</span>
remote: Compressing objects: 100% <span class="o">(</span>6/6<span class="o">)</span>, <span class="k">done</span><span class="nb">.</span>
remote: Total 16257 <span class="o">(</span>delta 0<span class="o">)</span>, reused 0 <span class="o">(</span>delta 0<span class="o">)</span>
Receiving objects: 100% <span class="o">(</span>16257/16257<span class="o">)</span>, 11.90 MiB | 40.00 KiB/s, <span class="k">done</span><span class="nb">.</span>
Resolving deltas: 100% <span class="o">(</span>10241/10241<span class="o">)</span>, <span class="k">done</span><span class="nb">.</span>
Checking connectivity... <span class="k">done</span><span class="nb">.</span>
<span class="nv">$ </span><span class="nb">cd </span>riak
<span class="nv">$ </span>git checkout riak-2.0.0rc1
Note: checking out <span class="s1">'riak-2.0.0rc1'</span><span class="nb">.</span>
HEAD is now at 87b8934... Bump riak to 2.0.0rc1 <span class="k">for </span>internal smoke testing
</code></pre></div></div>
<p>Next, let’s create the <code class="language-plaintext highlighter-rouge">devrel</code> (this step will take a few minutes):</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make devrel <span class="nv">DEVNODES</span><span class="o">=</span>5
</code></pre></div></div>
<p>Almost there. The following steps will start all <code class="language-plaintext highlighter-rouge">5</code> Riak nodes and join them
into a cluster:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cd </span>dev
<span class="nv">$ </span><span class="k">for </span>node <span class="k">in</span> <span class="sb">`</span><span class="nb">ls</span><span class="sb">`</span><span class="p">;</span> <span class="k">do</span> <span class="nv">$node</span>/bin/riak start<span class="p">;</span> <span class="k">done</span> <span class="o">&&</span> <span class="se">\</span>
<span class="k">for </span>n <span class="k">in</span> <span class="o">{</span>2..5<span class="o">}</span><span class="p">;</span> <span class="k">do </span>dev<span class="nv">$n</span>/bin/riak-admin cluster <span class="nb">join </span>dev1@127.0.0.1<span class="p">;</span> <span class="k">done
</span>Success: staged <span class="nb">join </span>request <span class="k">for</span> <span class="s1">'dev2@127.0.0.1'</span> to <span class="s1">'dev1@127.0.0.1'</span>
Success: staged <span class="nb">join </span>request <span class="k">for</span> <span class="s1">'dev3@127.0.0.1'</span> to <span class="s1">'dev1@127.0.0.1'</span>
Success: staged <span class="nb">join </span>request <span class="k">for</span> <span class="s1">'dev4@127.0.0.1'</span> to <span class="s1">'dev1@127.0.0.1'</span>
Success: staged <span class="nb">join </span>request <span class="k">for</span> <span class="s1">'dev5@127.0.0.1'</span> to <span class="s1">'dev1@127.0.0.1'</span>
<span class="nv">$ </span>/dev1/bin/riak-admin cluster plan
<span class="o">===============================</span> Staged Changes <span class="o">================================</span>
Action Details<span class="o">(</span>s<span class="o">)</span>
<span class="nt">-------------------------------------------------------------------------------</span>
<span class="nb">join</span> <span class="s1">'dev2@127.0.0.1'</span>
<span class="nb">join</span> <span class="s1">'dev3@127.0.0.1'</span>
<span class="nb">join</span> <span class="s1">'dev4@127.0.0.1'</span>
<span class="nb">join</span> <span class="s1">'dev5@127.0.0.1'</span>
<span class="nt">-------------------------------------------------------------------------------</span>
NOTE: Applying these changes will result <span class="k">in </span>1 cluster transition
<span class="c">###############################################################################</span>
After cluster transition 1/1
<span class="c">###############################################################################</span>
<span class="o">=================================</span> Membership <span class="o">==================================</span>
Status Ring Pending Node
<span class="nt">-------------------------------------------------------------------------------</span>
valid 100.0% 20.3% <span class="s1">'dev1@127.0.0.1'</span>
valid 0.0% 20.3% <span class="s1">'dev2@127.0.0.1'</span>
valid 0.0% 20.3% <span class="s1">'dev3@127.0.0.1'</span>
valid 0.0% 20.3% <span class="s1">'dev4@127.0.0.1'</span>
valid 0.0% 18.8% <span class="s1">'dev5@127.0.0.1'</span>
<span class="nt">-------------------------------------------------------------------------------</span>
Valid:5 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
Transfers resulting from cluster changes: 51
12 transfers from <span class="s1">'dev1@127.0.0.1'</span> to <span class="s1">'dev5@127.0.0.1'</span>
13 transfers from <span class="s1">'dev1@127.0.0.1'</span> to <span class="s1">'dev4@127.0.0.1'</span>
13 transfers from <span class="s1">'dev1@127.0.0.1'</span> to <span class="s1">'dev3@127.0.0.1'</span>
13 transfers from <span class="s1">'dev1@127.0.0.1'</span> to <span class="s1">'dev2@127.0.0.1'</span>
<span class="nv">$ </span>/dev1/bin/riak-admin cluster commit
Cluster changes committed
</code></pre></div></div>
<p>And…we’re done. Say hello to your very own Riak 2.0 cluster, built on
<code class="language-plaintext highlighter-rouge">R16B02_basho5</code>.</p>
Bootstrapping Private Subnet Instances In A VPC with Knife2012-12-25T12:00:00-05:00https://hector.dev/2012/12/25/bootstrapping-private-subnet-instances-in-amazon-vpc-with-knife<h2 id="amazon-vpc">Amazon VPC</h2>
<p><a href="https://aws.amazon.com/vpc/" target="_blank" rel="noopener">Amazon Virtual Private Cloud</a> (VPC) is a service
that allows you to define an isolated virtual network within EC2. A
<a href="https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Scenario2.html" target="_blank" rel="noopener">common scenario</a>
involves a VPC with both public and private subnets. Instances within
<em>public</em> subnets can send and receive traffic directly to/from the Internet.
On the other hand, instances within <em>private</em> subnets cannot receive traffic
directly from the Internet and can only send outbound traffic via a
<a href="https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_NAT_Instance.html" target="_blank" rel="noopener">NAT instance</a>.</p>
<h2 id="bastion-host">Bastion Host</h2>
<p>Given a VPC setup with both public and private subnets, you’ll want at least
one SSH bastion host in the public subnet. This host is needed to communicate
with instances in the private subnet from your local machine. The diagram
below, taken from Amazon’s documentation, helps illustrate:</p>
<p><img src="/assets/resized/bastion-320x276.png" alt="SSH Bastion with VPC" srcset="/assets/resized/bastion-320x276.png 320w, /images/2012-12-25-bootstrapping-private-subnet-instances-in-amazon-vpc-with-knife/bastion.png 469w" /></p>
<h2 id="knife-ec2-example">Knife EC2 Example</h2>
<p>Using a combination of <a href="https://docs.chef.io/knife.html" target="_blank" rel="noopener">Knife</a> and
the <a href="https://github.com/opscode/knife-ec2" target="_blank" rel="noopener">Knife EC2</a> plug-in, the following
command connects directly to the bastion host specified by the <code class="language-plaintext highlighter-rouge">--ssh-gateway</code>
option. From there another connection is made to the private subnet instance
via its <code class="language-plaintext highlighter-rouge">private_ip_address</code> in order to bootstrap Chef:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>knife ec2 server create <span class="nt">--flavor</span> hi1.4xlarge <span class="nt">--image</span> ami-08249861 <span class="se">\</span>
<span class="nt">--security-group-ids</span> <span class="o">[</span>SECURITY_GROUP_ID] <span class="nt">--tags</span> <span class="nv">Name</span><span class="o">=</span>node1-dev <span class="se">\</span>
<span class="nt">--availability-zone</span> us-east-1d <span class="nt">--subnet</span> <span class="o">[</span>SUBNET_ID] <span class="se">\</span>
<span class="nt">--node-name</span> node1-dev <span class="nt">--ssh-key</span> orgname <span class="nt">--ssh-gateway</span> bastion-dev <span class="se">\</span>
<span class="nt">--server-connect-attribute</span> private_ip_address <span class="se">\</span>
<span class="nt">--ssh-user</span> ec2-user <span class="nt">--identity-file</span> ~/.ec2/orgname.pem <span class="se">\</span>
<span class="nt">--environment</span> development <span class="nt">--ephemeral</span> <span class="s1">'/dev/sdb,/dev/sdc'</span> <span class="se">\</span>
<span class="nt">--run-list</span> <span class="s1">'role[base],role[solr_ssd_slave]'</span>
</code></pre></div></div>
<p>Depending on how long it takes your run list to converge on a bare operating
system, you should have Chef bootstrapped on an instance within the private
subnet of a VPC after running only one command!</p>
Preseeding Ubuntu Server and Static IP Addresses2011-11-18T12:00:00-05:00https://hector.dev/2011/11/18/preseeding-ubuntu-server-and-static-ip-addresses<p>Setting up a cluster of computers for any purpose usually requires installing an operating system. The installation process typically consists of several questions and identical answers for each node in the cluster. Automating the submission of answers to these questions is desirable — not only to prevent inconsistencies, but for general convenience.</p>
<h2 id="preseeding">Preseeding</h2>
<p>I spent the last few days working to stand up a proof-of-concept Riak cluster. The first step involved installing Ubuntu Oneiric Ocelot (<code class="language-plaintext highlighter-rouge">11.10</code>) on four virtual machines. Luckily, Ubuntu/Debian has a process called <a href="https://wiki.debian.org/DebianInstaller/Preseed" target="_blank" rel="noopener">preseeding</a> to facilitate automated installations. Surprisingly, it also has limited support for Red Hat’s <a href="https://pykickstart.readthedocs.io/en/latest/" target="_blank" rel="noopener">Kickstart</a>. Playing it safe, I went with preseeding.</p>
<p>There are three methods that can be used for preseeding: <code class="language-plaintext highlighter-rouge">initrd</code>, <code class="language-plaintext highlighter-rouge">file</code>, and <code class="language-plaintext highlighter-rouge">network</code>. I wasn’t interested in re-authoring ISOs or setting up a TFTP server, so I went with a web-accessible preseed file. The pros of this approach are that the configuration file is easily modifiable, yet still accessible. The cons are that it doesn’t become available to the installer until the network is configured.</p>
<h2 id="assigning-a-static-ip-problem">Assigning a Static IP Problem</h2>
<p>Because web-accessible preseed files aren’t available until the network is configured, the step to assign a static IP address gets missed. Below are several approaches I found to assign a static IP address with preseeding.</p>
<h3 id="boot-parameters">Boot Parameters</h3>
<p>The boot prompt is where you tell the installer how to locate your preseed file. It is also where you can pass a fixed number of preseed directives. In our example of assigning a static IP address, you’d pass things like IP address, hostname, domain, and netmask. Ultimately, I wasn’t too interested in this approach because it required a lot of typing without clipboard access.</p>
<p><img src="/assets/resized/ubuntu-boot-prompt-480x406.png" alt="Ubuntu Boot Prompt" srcset="/assets/resized/ubuntu-boot-prompt-320x270.png 320w,/assets/resized/ubuntu-boot-prompt-480x406.png 480w, /images/2011-11-18-preseeding-ubuntu-server-and-static-ip-addresses/ubuntu-boot-prompt.png 754w" /></p>
<h3 id="re-evaluating-network-configuration">Re-evaluating Network Configuration</h3>
<p>The Ubuntu Help wiki has a suggested <a href="https://help.ubuntu.com/16.04/installation-guide/example-preseed.txt" target="_blank" rel="noopener">hack</a> to trigger re-evaluation of preseeded network configuration settings by executing commands via <code class="language-plaintext highlighter-rouge">preseed/run</code>. Unfortunately, I was unable to get this to work successfully. In every combination I tried, it resulted in the installer failing. This related <a href="https://ubuntuforums.org/showthread.php?t=1494309" target="_blank" rel="noopener">Ubuntu Forums post</a> outlines the suggested steps pretty well.</p>
<h3 id="overwriting-network-configuration">Overwriting Network Configuration</h3>
<p>Eventually this is the solution I used to assign a static IP address. It’s a hack, but in my eyes it was the lesser of three evils. Alongside each node’s preseed configuration file, I created a corresponding shell script. The shell script gets executed before the installer triggers a reboot and overwrites <code class="language-plaintext highlighter-rouge">/etc/network/interfaces</code> with a static IP configuration:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">echo</span> <span class="s2">"auto lo
iface lo inet loopback
auto eth0
iface eth0 inet static
address 192.168.1.10
netmask 255.255.255.0
gateway 192.168.1.1
"</span> <span class="o">></span> /etc/network/interfaces
</code></pre></div></div>
<p>If anyone has a better approach to setting a static IP address via preseeding or Kickstart, let me know!</p>
Testing Command-line Applications with Aruba2011-10-25T13:00:00-04:00https://hector.dev/2011/10/25/testing-command-line-applications-with-aruba<p><a href="https://cucumber.io" target="_blank" rel="noopener">Cucumber</a> is often used to test web applications. Many developers hook it into their Rails projects to integration test site features. Wouldn’t it be great if there were a way to test command-line applications in a similar fashion? You can with <a href="https://github.com/cucumber/aruba" target="_blank" rel="noopener">Aruba</a>.</p>
<h2 id="aruba">Aruba</h2>
<p>Aruba is a Cucumber extension for testing command-line applications written in any language. Passing arguments, interacting with the file system, capturing exit codes, and mimicking interactive usage are all features provided out of the box. Below is a basic test for the <code class="language-plaintext highlighter-rouge">mv</code> command that passes:</p>
<div class="language-cucumber highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">Scenario</span><span class="p">:</span> Backing up test.conf
<span class="nf">When </span>I run `mv test.conf test.conf.bak`
<span class="err">Then the output should contain</span><span class="p">:</span>
<span class="s">"""
mv: rename test.conf to test.conf.bak: No such file or directory
"""</span>
</code></pre></div></div>
<p>Now let’s showoff a few of Aruba’s built-in steps to prevent the command from failing:</p>
<div class="language-cucumber highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">Scenario</span><span class="p">:</span> Backing up test.conf
<span class="nf">Given </span>an empty file named <span class="s">"test.conf"</span>
<span class="nf">When </span>I run `mv test.conf test.conf.bak`
<span class="nf">Then </span>the exit status should be 0
<span class="err">And the following files should exist</span><span class="p">:</span>
<span class="p">|</span> <span class="nv">test.conf.bak</span> <span class="p">|</span>
<span class="n">And</span> <span class="n">the</span> <span class="n">following</span> <span class="n">files</span> <span class="n">should</span> <span class="n">not</span> <span class="n">exist:</span>
<span class="p">|</span> <span class="n">test.conf</span> <span class="p">|</span>
</code></pre></div></div>
<p>The first step creates an empty file and executes <code class="language-plaintext highlighter-rouge">mv</code> inside of Aruba’s sandbox directory. After the <code class="language-plaintext highlighter-rouge">mv</code> command is executed, its exit status is compared to <code class="language-plaintext highlighter-rouge">0</code> and the existence of <code class="language-plaintext highlighter-rouge">test.conf.bak</code> (and non-existence of <code class="language-plaintext highlighter-rouge">test.conf</code>) is confirmed.</p>
<p>It’s also worth noting that after each scenario Aruba clears out its sandbox — a temporary directory that becomes the current working directory for your command-line tool — unless you explicitly tag the scenario with <code class="language-plaintext highlighter-rouge">@no-clobber</code>. This tag preserves the previous scenario’s final state. Tying this back to the example above, the next scenario would begin with only <code class="language-plaintext highlighter-rouge">test.conf.bak</code> in the sandbox. Additional Aruba-specific tags can be found in the <a href="https://github.com/cucumber/aruba#readme" target="_blank" rel="noopener">README</a>.</p>
<h2 id="extending-the-aruba-api">Extending the Aruba API</h2>
<p>As a command-line application evolves, other conditions not available in Aruba’s built-in API will require testing. For example, say you need to assert a file’s user and group attributes. Because Aruba’s API was built using Ruby modules, it can be reopened inside of Cucumber’s <code class="language-plaintext highlighter-rouge">env.rb</code>:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">Aruba</span>
<span class="k">module</span> <span class="nn">Api</span>
<span class="k">def</span> <span class="nf">check_file_owner_and_group</span><span class="p">(</span><span class="n">paths_and_users_and_groups</span><span class="p">)</span>
<span class="n">prep_for_fs_check</span> <span class="k">do</span> <span class="c1"># Lower-level function provided by Aruba</span>
<span class="n">paths_and_users_and_groups</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">path</span><span class="p">,</span> <span class="n">user</span><span class="p">,</span> <span class="n">group</span><span class="o">|</span>
<span class="n">stat</span> <span class="o">=</span> <span class="no">File</span><span class="p">.</span><span class="nf">stat</span><span class="p">(</span><span class="n">path</span><span class="p">)</span>
<span class="no">Etc</span><span class="p">.</span><span class="nf">getpwuid</span><span class="p">(</span><span class="n">stat</span><span class="p">.</span><span class="nf">uid</span><span class="p">).</span><span class="nf">name</span><span class="p">.</span><span class="nf">should</span> <span class="o">==</span> <span class="n">user</span>
<span class="no">Etc</span><span class="p">.</span><span class="nf">getgrgid</span><span class="p">(</span><span class="n">stat</span><span class="p">.</span><span class="nf">gid</span><span class="p">).</span><span class="nf">name</span><span class="p">.</span><span class="nf">should</span> <span class="o">==</span> <span class="n">group</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Then create a matcher:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Then</span> <span class="sr">/^the following files should have username "([^"]*)" and group "([^"]*)":$/</span> <span class="k">do</span> <span class="o">|</span><span class="n">user</span><span class="p">,</span> <span class="n">group</span><span class="p">,</span> <span class="n">files</span><span class="o">|</span>
<span class="n">check_file_owner_and_group</span><span class="p">(</span><span class="n">files</span><span class="p">.</span><span class="nf">raw</span><span class="p">.</span><span class="nf">map</span> <span class="p">{</span> <span class="o">|</span><span class="n">file_row</span><span class="o">|</span> <span class="p">(</span><span class="n">file_row</span> <span class="o"><<</span> <span class="n">user</span><span class="p">)</span> <span class="o"><<</span> <span class="n">group</span> <span class="p">})</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Now that step can be included to test the user and group attributes of files:</p>
<div class="language-cucumber highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">Scenario</span><span class="p">:</span> Backing up test.conf
<span class="nf">Given </span>an empty file named <span class="s">"test.conf"</span>
<span class="nf">When </span>I run `mv test.conf test.conf.bak`
<span class="nf">And </span>the exit status should be 0
<span class="err">And the following files should exist</span><span class="p">:</span>
<span class="p">|</span> <span class="nv">test.conf.bak</span> <span class="p">|</span>
<span class="n">And</span> <span class="n">the</span> <span class="n">following</span> <span class="n">files</span> <span class="n">should</span> <span class="n">not</span> <span class="n">exist:</span>
<span class="p">|</span> <span class="n">test.conf</span> <span class="p">|</span>
<span class="err">And the following files should have username "hector" and group "staff"</span><span class="p">:</span>
<span class="p">|</span> <span class="nv">test.conf.bak</span> <span class="p">|</span>
</code></pre></div></div>
<h2 id="conclusion">Conclusion</h2>
<p>Using a behavior-driven development approach for building command-line applications with Cucumber and Aruba was a pleasure. Aruba’s API covers a decent amount of ground and was easily expandable. The source code was straightforward and after skimming its internals, I was able to expand the API to meet my needs. Hopefully reading this will help you do the same.</p>