<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[Technical Journal]]></title>
  <link href="http://davidad.github.io/atom.xml" rel="self"/>
  <link href="http://davidad.github.io/"/>
  <updated>2014-04-25T04:25:09-04:00</updated>
  <id>http://davidad.github.io/</id>
  <author>
    <name><![CDATA[davidad (David A. Dalrymple)]]></name>
    
  </author>
  <generator uri="http://octopress.org/">Octopress</generator>

  
  <entry>
    <title type="html"><![CDATA[An OSI layer model for the 21st century]]></title>
    <link href="http://davidad.github.io/blog/2014/04/24/an-osi-layer-model-for-the-21st-century/"/>
    <updated>2014-04-24T17:48:03-04:00</updated>
    <id>http://davidad.github.io/blog/2014/04/24/an-osi-layer-model-for-the-21st-century</id>
    <content type="html"><![CDATA[<p>The Internet protocol suite is wonderful, but it was designed before the advent of modern cryptography and without the benefit of hindsight. On the modern Internet, cryptography is typically squeezed into a single, incredibly complex layer, Transport Layer Security (TLS; formerly known as Secure Sockets Layer, or SSL). Over the last few months, 3 entirely unrelated (but equally catastrophic) bugs have been uncovered in 3 independent TLS implementations (<a href="https://www.imperialviolet.org/2014/02/22/applebug.html">Apple SSL/TLS</a>, <a href="http://arstechnica.com/security/2014/03/critical-crypto-bug-leaves-linux-hundreds-of-apps-open-to-eavesdropping/">GnuTLS</a>, and most recently <a href="http://heartbleed.com">OpenSSL</a>, which powers most “secure” servers on the Internet), making the TLS system difficult to trust in practice.</p>
<p>What if cryptographic functions were spread out into more layers? Would the stack of layers become too tall, inefficient, and hard to debug, making the problem worse instead of better? On the contrary, I propose that appropriate cryptographic protocols could replace most existing layers, improving security as well as other functions generally not thought of as cryptographic, such as concurrency control of complex data structures, lookup or discovery of services and data, and decentralized passwordless login. Perhaps most importantly, the new architecture would enable individuals to internetwork as peers rather than as tenants of the telecommunications oligopoly, putting net neutrality directly in the hands of citizens and potentially enabling a drastically more competitive bandwidth market.</p>
<style>
td, th {
text-align: center;
}
b {
font-weight: bold;
}
table tr td i {
font-style: italic;
}
thead {
border-bottom: 1px black solid;
}
td.common {
background-color: #e8f87e;
}
td.practice {
background-color: #ffda88;
}
td.phy {
background-color: #d8f0fe;
}
td.new {
background-color: #d0ee9a;
font-weight: bold;
}
td {
border-bottom: 1px solid rgba(150,150,150,0.2);
}
</style>
</style>
<table>
  <thead>
  <tr>
  <th width=40></th><th width=200>
Current <a href="http://en.wikipedia.org/wiki/OSI_model">OSI model</a>
</th><th width=180>
In practice
</th> <th width=200>
Proposed update
</th>
  </tr>
  </thead>
  <tbody>
  <tr><td>
8
</td><td>
<i>(none)</i>
</td><td class="common">
Application
</td><td class="common">
<a href="http://davidad.github.io/blog/2014/04/24/an-osi-layer-model-for-the-21st-century/#Application">Application</a>
</td></tr>
  <tr><td>
7
</td><td class="common">
“<a href="http://en.wikipedia.org/wiki/Application_layer">Application</a>”
</td><td class="practice">
<a href="http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol">HTTP</a>
</td><td class="new">
<a href="http://davidad.github.io/blog/2014/04/24/an-osi-layer-model-for-the-21st-century/#Transactions">Transactions</a>
</td></tr>
  <tr><td>
6
</td><td class="common">
<a href="http://en.wikipedia.org/wiki/Presentation_layer">Presentation</a>
</td><td class="practice">
<a href="http://en.wikipedia.org/wiki/Transport_Layer_Security">SSL/TLS</a>
</td><td class="new">
<a href="http://davidad.github.io/blog/2014/04/24/an-osi-layer-model-for-the-21st-century/#Non-Repudiation">(Non-)Repudiation</a>
</td></tr>
  <tr><td>
5
</td><td class="common">
<a href="http://en.wikipedia.org/wiki/Session_layer">Session</a>
</td><td class="practice" rowspan="2">
<a href="http://en.wikipedia.org/wiki/Transmission_Control_Protocol">TCP</a>
</td><td class="new">
<a href="http://davidad.github.io/blog/2014/04/24/an-osi-layer-model-for-the-21st-century/#Confidentiality">Confidentiality</a>
</td></tr>
  <tr><td>
4
</td><td class="common">
<a href="http://en.wikipedia.org/wiki/Transport_layer">Transport</a>
</td>                                                                          <td class="new">
<a href="http://davidad.github.io/blog/2014/04/24/an-osi-layer-model-for-the-21st-century/#Availability">Availability</a>
</td></tr>
  <tr><td>
3
</td><td class="common">
<a href="http://en.wikipedia.org/wiki/Network_layer">Network</a>
</td><td class="practice">
<a href="http://en.wikipedia.org/wiki/Internet_Protocol">IP</a>
</td><td class="new">
<a href="http://davidad.github.io/blog/2014/04/24/an-osi-layer-model-for-the-21st-century/#Integrity">Integrity</a>
</td></tr>
  <tr><td>
2
</td><td class="common">
<a href="http://en.wikipedia.org/wiki/Data_link_layer">Data Link</a>
</td>
  <td class="phy" rowspan="2">
<a href="http://en.wikipedia.org/wiki/E-UTRA">e-UTRA</a> (LTE), <a href="http://en.wikipedia.org/wiki/IEEE_802.11">802.11</a> (WiFi), <a href="http://en.wikipedia.org/wiki/IEEE_802.3">802.3</a> (Ethernet), <i>etc.</i>
</td>
  <td class="common">
Data Link
</td></tr>
  <tr><td>
1
</td><td class="common">
<a href="http://en.wikipedia.org/wiki/Physical_layer">Physical</a>
</td><td class="common">
Physical
</td></tr>
  </tbody>
</table>
<p><br/></p>
<p>Of course, the layers I propose will doubtless introduce new problems of their own, but I’d like to start this conversation with some concrete ideas, even if I don’t have a final answer. (Please feel free to <a
 href="http://mailhide.recaptcha.net/d?k=01A3Grt9OhKg2-MSZSi6YDVA==&c=YXdAjPYO-xwh0WDnMu37kmOqfzUGcLhwkXoLkHdM6NA=">email</a> me your comments or tweet <a href="http://twitter.com/davidad"><span class="citation" data-cites="davidad">@davidad</span></a>.)</p>
<p>Descriptions follow for each of the five new layers I suggest, four of which are named after common <a href="http://en.wikipedia.org/wiki/Security_testing">information security requirements</a>, and one of which (<a href="http://davidad.github.io/blog/2014/04/24/an-osi-layer-model-for-the-21st-century/#Transactions">Transactions</a>) is borrowed from <a href="http://en.wikipedia.org/wiki/ACID">database requirements</a> (and also vaguely suggestive of cryptocurrency).</p>
<!-- more -->

<hr />
<p><strong>General disclaimer for InfoSec articles:</strong> <em>Reading this article does not qualify you to design secure systems. Writing this article does not qualify </em>me <em>to design secure systems. In fact, </em>nobody is qualified to design secure systems<em>. A system should not be considered secure unless it has been reviewed by multiple security experts </em>and <em>resisted multiple serious attempts to violate its security claims in practice. The information contained in this article is offered “as is” and without warranties of any kind (express, implied, and statutory), all of which the author expressly disclaims to the fullest extent permitted by law.</em></p>
<h2 id="data-link-and-physical-layers">Data Link and Physical layers</h2>
<p>For our purposes today, the Data Link and Physical layers are a black box (perhaps literally), to which we have an interface (the “network interface”) which looks like a transmit queue and a receive queue. These queues can store “payloads” of anywhere from 1 to 1280<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a> <a href="http://en.wikipedia.org/wiki/Octet_(computing)">octets</a> (bytes). The next layer in the stack can push a payload onto the Data Link transmit queue (and possibly get an error if it’s full) and can pop a payload from the Data Link receive queue (and possibly get an error if it’s empty). The Data Link layer is responsible for (eventually) flushing the transmit queue, and any payload which leaves the transmit queue must appear on the receive queues of all <i>other</i> devices connected to the same <a href="http://en.wikipedia.org/wiki/Channel_(communications)">channel</a> (a technical term, which may refer to a radio channel in the case of cellular devices, or simply to a particular length of cable in a point-to-point wired connection).</p>
<p><a name="Integrity"></a></p>
<h2 id="integrity-layer">Integrity layer</h2>
<p>We would like a received payload to self-evidently be the same payload which was sent. Although the Data Link layer is supposed to provide such an assurance, various kinds of attacks on the system might invalidate this assumption. Integrity protocols mitigate these attacks:</p>
<p>
<table>
<thead>
  <tr><th width=60>
Paranoia Level
</th><th width=240>
Attacks
</th><th width=180>
Mitigation
</th><th width=150>
Common Implementation
</th><th width=180>
My Preferred Implementation
</th></tr>
</thead>
<tbody>
  <tr><td>
1
</td><td>
Thermal noise, cosmic rays
</td><td>
checksum hash
</td><td>
<a href="http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Checksum_computation">TCP Checksum</a>
</td><td>
<a href="http://www.strchr.com/crc32_popcnt">CRC-32C</a>
</td></tr>
  <tr><td>
2
</td><td>
Deliberate corruption
</td><td>
cryptographic hash
</td><td>
<a href="http://en.wikipedia.org/wiki/SHA-1">SHA-1</a>
</td><td>
<a href="https://blake2.net/">BLAKE2b</a>
</td></tr>
  <tr><td>
3
</td><td>
Spoofing of trusted contacts
</td><td>
keyed hash
</td><td>
<a href="http://en.wikipedia.org/wiki/Hash-based_message_authentication_code">HMAC-SHA1</a>
</td><td>
<a href="https://131002.net/siphash/siphash.pdf">SipHash</a>
</td></tr>
  <tr><td>
4
</td><td>
Spoofing of strangers
</td><td>
public-key signature of cryptographic hash
</td><td>
<a href="http://en.wikipedia.org/wiki/SHA-1">SHA-1</a> + <a href="http://en.wikipedia.org/wiki/RSA_(cryptosystem)">RSA</a>
</td><td>
<a href="https://blake2.net/">BLAKE2b</a> + <a href="http://ed25519.cr.yp.to/index.html">Ed25519</a>
</td></tr>
</tbody>
</table>
</p>

<p>Integrity protocols are fairly simple: the appropriate verification material is placed at the beginning of every Data Link payload. The Integrity layer exposes the same kind of “transmit queue and receive queue” interface as the Data Link layer, but the payload which can be passed to the Integrity layer must be somewhat smaller, so that there is room for the verification material and the Integrity payload together to fit into 1280 octets. Overhead ranges from 4 octets for a CRC-32C checksum to 96 octets for an Ed25519 signature.</p>
<p>In the keyed hash case, some state is necessary at the Integrity protocol level: each API customer must be able to add “trusted contacts” to its “address book” by specifying a symmetric key corresponding to a given endpoint name (which may have been negotiated at a higher protocol level, or simply out-of-band entirely). Since some advanced higher-level protocols may define symmetric authentication keys that are only good for a single use (e.g. <a href="https://github.com/trevp/axolotl/wiki">Axolotl ratcheting</a> after the handshake phase), “address book entries” should be single-use by default, with renewal explicitly required after each payload received from a given contact.</p>
<p><a name="Availability"></a></p>
<h2 id="availability-layer">Availability layer</h2>
<p>We would like networked endpoints to be available to receive packets from other endpoints in a way that is robust to unannounced changes in network topology. This layer conceptually takes the place of the <a href="http://en.wikipedia.org/wiki/Network_layer">Network</a> layer in the original model, as it will be responsible for routing packets. <b>Significantly, in this proposal, there are no “hosts” or “ports”: only “endpoints”, identified by public keys.</b> This is simply taking the <a href="http://en.wikipedia.org/wiki/End-to-end_principle">end-to-end principle</a> one step further, by considering the “host” merely part of the network infrastructure which makes applications available.</p>
<p>A fully implemented Availability layer should provide <a href="http://en.wikipedia.org/wiki/Unicast">unicast</a> (deliver to a unique endpoint authenticated by a given public key, wherever it may be), <a href="http://en.wikipedia.org/wiki/Anycast">anycast</a> (deliver to nearest endpoint authenticated by a given public key), and <a href="http://en.wikipedia.org/wiki/Multicast">multicast</a> (<i>a.k.a.</i> <a href="http://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern">pub/sub</a>: route to all endpoints who have asked to subscribe to a given ID, and provide a subscription method).</p>
<p>
<table>
<thead>
  <tr><th width=80 rowspan="2">
Routing Semantics
</th><th width=170 rowspan="2">
Current Reliability
</th><th colspan="2" style="border-bottom: 1px solid rgba(150,150,150,0.4)">
New Implemenation
</th></tr>
  <tr><th width=260>
Overlay on existing Internet
</th><th width=280>
Native Mesh
</th></tr>
</thead>
<tbody>
  <tr><td>
Multicast
</td><td>
awful
</td><td>
<a href="http://www.researchgate.net/publication/4319659_SKademlia_A_practicable_approach_towards_secure_key-based_routing/file/72e7e524ad3e97d67d.pdf">S/Kademlia</a> message broker
</td><td>
Straightforward extension of unicast
</td></tr>
  <tr><td>
Anycast
</td><td>
decent
</td><td>
No advantage over load balancers
</td><td>
Possible extension of unicast
</td></tr>
  <tr><td>
Unicast
</td><td>
excellent
</td><td>
Special case of multicast
</td><td>
<a href="http://arxiv.org/pdf/0909.2859v1.pdf">Electric Routing</a>
</td></tr>
</tbody>
</table>
</p>

<p>I believe the <a href="http://arxiv.org/pdf/0909.2859v1.pdf">Electric Routing</a> algorithm<a href="#fn2" class="footnoteRef" id="fnref2"><sup>2</sup></a> is up to the challenge of replacing unicast<a href="#fn3" class="footnoteRef" id="fnref3"><sup>3</sup></a>, and that it could be extended to provide multicast and even anycast, but other algorithms could be developed at this protocol layer as well. The first real-world implementation of the system I’m describing will very likely be developed as an overlay network on top of <a href="http://en.wikipedia.org/wiki/Internet_Protocol">IP</a>, in which case multicast can be implemented simply atop <a href="http://www.researchgate.net/publication/4319659_SKademlia_A_practicable_approach_towards_secure_key-based_routing/file/72e7e524ad3e97d67d.pdf">S/Kademlia</a>, with unicast as a special case, and anycast can be emulated with standard load-balancing techniques.</p>
<p>The tradeoff here is that routers have a lot more work to do, since there are no “addresses” corresponding directly to geographic location. But, it means that every node on the network can participate as a router, so there is a lot more capacity to do that work. In addition, the endpoints-only scheme has many potentially desirable properties with respect to features like pseudonymity, NAT transparency, redundancy, and decentralization of the telecommunications market (especially in densely settled areas).</p>
<p><a name="Confidentiality"></a></p>
<h2 id="confidentiality-layer">Confidentiality layer</h2>
<p>Ideally, we would like to not transmit any information to anything other than the destination endpoint(s). This ideal is not in general achievable on a public network, but some types of mitigation are possible:</p>
<p>
<table>
<thead>
  <tr><th width=60>
Paranoia Level
</th><th width=240>
Attacks
</th><th width=180>
Mitigation
</th><th width=150>
Common Implementation
</th><th width=180>
My Preferred Implemenation
</th></tr>
</thead>
<tbody>
  <tr><td>
1
</td><td>
Sniffing payloads to trusted contacts
</td><td>
<a href="http://en.wikipedia.org/wiki/Symmetric-key_algorithm">symmetric</a> encryption
</td><td>
<a href="http://en.wikipedia.org/wiki/Advanced_Encryption_Standard">AES</a>
</td><td>
<a href="http://cr.yp.to/chacha.html">ChaCha</a>
</td></tr>
  <tr><td>
2
</td><td>
Sniffing payloads to strangers
</td><td>
<a href="http://en.wikipedia.org/wiki/Public-key_cryptography">public-key</a> encryption
</td><td>
<a href="http://en.wikipedia.org/wiki/RSA_(algorithm)">RSA</a>
</td><td>
<a href="http://en.wikipedia.org/wiki/RSA_(algorithm)">RSA</a>
</td></tr>
  <tr><td>
3
</td><td>
Chosen plaintext attacks
</td><td>
<a href="http://en.wikipedia.org/wiki/Key-agreement_protocol">key agreement</a> + symmetric encryption
</td><td>
<a href="http://en.wikipedia.org/wiki/Elliptic_Curve_Diffie%E2%80%93Hellman">ECDH</a> + <a href="http://en.wikipedia.org/wiki/Advanced_Encryption_Standard">AES</a>
</td><td>
<a href="http://cr.yp.to/ecdh.html">Curve25519</a> + <a href="http://cr.yp.to/chacha.html">ChaCha</a>
</td></tr>
  <tr><td>
4
</td><td>
Key compromise
</td><td>
ephemeral key agreement + symmetric encryption
</td><td>
<a href="http://vincent.bernat.im/en/blog/2011-ssl-perfect-forward-secrecy.html">ECDHE</a> + <a href="http://en.wikipedia.org/wiki/Advanced_Encryption_Standard">AES</a>
</td><td>
<a href="https://github.com/trevp/axolotl/wiki">Axolotl ratchet</a> with <a href="http://cr.yp.to/ecdh.html">Curve25519</a>, <a href="https://131002.net/siphash/siphash.pdf">SipHash</a>, <a href="http://en.wikipedia.org/wiki/PBKDF2">PBKDF2</a>, <a href="http://cr.yp.to/chacha.html">ChaCha</a>
</td></tr>
</tbody>
</table>
</p>

<p>In cases 3 and 4, this layer has to maintain some state, holding session keys or message keys, and the Axolotl ratchet is a little complicated; but this layer does not have to worry about the verification of identity (which will be provided on a higher layer, by services such as <a href="https://keybase.io">keybase.io</a> or using pronounceable hash fingerprints) or integrity (which will be provided by a lower layer).</p>
<p><a name="Non-Repudiation"></a></p>
<h2 id="non-repudiation-andor-repudiation-layer">Non-Repudiation and/or Repudiation layer</h2>
<p>We would like for a receiver to be sure that a message they receive was sent by a given sender, and we would like for a sender to be sure that a given message was successfully received. Sometimes, we would also like for a receiver to be unaware of the location a message was sent from. The result is three related but orthogonal protocol types, which may be nested:</p>
<p>
<table>
<thead>
  <tr><th width=180>
Repudiation Property
</th><th width=200>
Meaning
</th><th width=300>
Protocol
</th></tr>
</thead>
<tbody>
  <tr><td>
Non-Repudiation of Sending
</td><td>
Recipient knows immediate sender
</td><td>
Sender includes a hash of their public key in the message. To understand why this is necessary given the Integrity layer, read <a href="http://world.std.com/~dtd/sign_encrypt/sign_encrypt7.html">this excellent article</a>
</td></tr>
  <tr><td>
Non-Repudiation of Receipt
</td><td>
Sender knows message was received
</td><td>
Recipient must send a signed acknowledgement for every message. This also implements “reliable delivery”
</td></tr>
  <tr><td>
Repudiation of Origin
</td><td>
Message is difficult to trace
</td><td>
<a href="http://en.wikipedia.org/wiki/Onion_routing">Onion Routing</a>
</td></tr>
</tbody>
</table>
</p>


<p><a name="Transactions"></a></p>
<h2 id="transactions-layer">Transactions layer</h2>
<p>We would like for sets of nodes which wish to maintain common mutable state variables to be able to do so, even in the presence of various types of adversaries. This is a common abstraction for the requirements of <code>git</code>, cryptocurrencies, and distributed databases (i.e. <a href="http://en.wikipedia.org/wiki/ACID">ACID</a> <a href="http://en.wikipedia.org/wiki/Multiversion_concurrency_control">MVCC</a>). I propose that (borrowing most directly from <code>git</code>, but also from Clojure’s concurrent data structures) changes in large or complex mutable states be represented as changes to the root of a <a href="http://en.wikipedia.org/wiki/Merkle_tree">Merkle tree</a>, thus reducing the state subject to transactional semantics to single-packet size<a href="#fn4" class="footnoteRef" id="fnref4"><sup>4</sup></a>.</p>
<p>To make it obvious what I’m intending to refer to, the owner of a particular “domain name” or a particular “coin” (or, generally, any cryptographically controlled resource) is an example of a mutable state. But so is, for instance, the contents of any social media profile, email inbox, hypertext page, or source code repository. These things could all be managed without reference to central authorities or single points of failure.</p>
<p>
<table>
<thead>
  <tr><th width=60>
Paranoia Level
</th><th width=340>
Attacks
</th><th width=380>
Mitigation
</th></tr>
</thead>
<tbody>
  <tr><td>
1
</td><td>
Asynchrony; node failure/disconnection
</td><td>
<a href="http://www.cos.ufrj.br/~monnerat/papers/Monnerat_et_Amorim_D1HT_2006.pdf">D1HT</a> tracker
</td></tr>
  <tr><td>
2
</td><td>
Sybil attacks; eclipse attacks; churn attacks
</td><td>
<a href="http://www.researchgate.net/publication/4319659_SKademlia_A_practicable_approach_towards_secure_key-based_routing/file/72e7e524ad3e97d67d.pdf">S/Kademlia</a> tracker
</td></tr>
  <tr><td>
3
</td><td>
Malicious trackers
</td><td>
<a href="http://research.microsoft.com/en-us/um/people/lamport/pubs/disc-leaderless-web.pdf">Leaderless Byzantine Paxos</a> or <a href="http://link.springer.com/article/10.1007%2Fs10796-013-9460-7#page-1">Byzantine gossip</a>
</td></tr>
  <tr><td>
4
</td><td>
Any attack that Bitcoin can survive
</td><td>
<a href="https://en.bitcoin.it/wiki/Block_chain">Block-chain</a> protocol
</td></tr>
</tbody>
</table>
</p>

<p>Many (including myself) have claimed that the core contribution of Bitcoin, the block-chain protocol, is a novel solution to the <a href="http://research.microsoft.com/en-us/um/people/lamport/pubs/byz.pdf">Byzantine Generals Problem</a>, but it turns out this is somewhat misleading. Although the block-chain protocol is Byzantine-fault-tolerant in a novel way, there has been plenty of research on Byzantine protocols over the years, and it seems probably unnecessary to constantly “mine,” i.e. solve cryptopuzzles, to achieve Byzantine fault tolerance. The main reason to introduce cryptopuzzles is to reduce the efficacy of <a href="http://en.wikipedia.org/wiki/Sybil_attack">Sybil attacks</a>, in which one malicious actor fabricates arbitrarily many identities in order to exceed the Byzantine fault tolerance threshold and control the system. However, these attacks can also be mitigated by requiring crypto-puzzles only for joining the network (as in <a href="http://www.researchgate.net/publication/4319659_SKademlia_A_practicable_approach_towards_secure_key-based_routing/file/72e7e524ad3e97d67d.pdf">S/Kademlia</a>), and by blacklisting nodes which behave suspiciously (the latter being how most attacks on Bitcoin are stopped in practice).</p>
<p><a name="Application"></a></p>
<h2 id="application-layer">Application layer</h2>
<p>In such an environment, applications (or application components!) are essentially just maps from one mutable state to another, in <a href="http://elm-lang.org/learn/What-is-FRP.elm">functional reactive programming</a> style. In the same way that you might encode packet filters into a kernel’s TCP/IP stack today, you might encode entire applications into a kernel’s “mesh” stack in the future. Various search functions, including full-text search, could be provided using the <a href="http://www.michaelpiatek.com/papers/oneswarm_SIGCOMM.pdf">OneSwarm</a> approach or potentially by distributed <a href="http://en.wikipedia.org/wiki/Bloom_filter">Bloom filters</a> implemented atop this platform (an idea due to <a href="https://twitter.com/AndreeMonette">Andrée Monette</a>). Resource control and access control can be provided by means of <a href="http://www.erights.org/elib/capability/ode/ode-protocol.html">cryptographic capabilities</a>.</p>
<p>But, in general, this layer is completely open for all sorts of applications. Essentially, any end-user service that runs on a network (and what doesn’t, these days?) would fit here.</p>
<h1 id="conclusion">Conclusion</h1>
<p>I’ve outlined some radical ideas for how to re-build the Internet protocol stack in a way that is ultimately more coherent with Internet cultural values (freedom of expression, pseudonymity, reduced potential for abuses of power). This outline still needs quite a bit of work and thought before being turned into implementations, but I feel like I’ve reached a turning point in making my <a href="http://mesh.is/confusing,ok?">ideas</a> about next-generation architectures concrete, and at a timely moment with respect to conversations about TLS and net neutrality. If you would like to see these concepts made into working code, please <a
href="http://mailhide.recaptcha.net/d?k=01A3Grt9OhKg2-MSZSi6YDVA==&c=YXdAjPYO-xwh0WDnMu37kmOqfzUGcLhwkXoLkHdM6NA=">reach out</a> and let me know.</p>
<section class="footnotes">
<hr />
<ol>
<li id="fn1"><p>This number is cribbed from the <a href="http://tools.ietf.org/html/rfc2460">IPv6 RFC</a>.<a href="#fnref1">↩</a></p></li>
<li id="fn2"><p>coauthored by <a href="http://www.maymounkov.org/">Petar Maymounkov</a>, who also coauthored <a href="http://en.wikipedia.org/wiki/Kademlia">Kademlia</a>, the <a href="http://en.wikipedia.org/wiki/Distributed_hash_table">DHT</a> powering <a href="http://en.wikipedia.org/wiki/BitTorrent_(protocol)">BitTorrent</a><a href="#fnref2">↩</a></p></li>
<li id="fn3"><p>Electric routing does need some extensions to mitigate various attacks, but I believe the countermeasures from <a href="http://www.researchgate.net/publication/4319659_SKademlia_A_practicable_approach_towards_secure_key-based_routing/file/72e7e524ad3e97d67d.pdf">S/Kademlia</a> are readily adapted to meet these needs.<a href="#fnref3">↩</a></p></li>
<li id="fn4"><p>This is similar in principle to the trick used by most practical public-key cryptosystems, which use the actual public-key algorithm only to encrypt a key from some symmetric cryptosystem, and then encrypt arbitrarily large content using a stream cipher. The common principle is that you can do the hard security algorithm on a small piece of data, and use easier security algorithms to apply those hard security properties to large chunks of data.<a href="#fnref4">↩</a></p></li>
</ol>
</section>]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[All Boolean functions are polynomials]]></title>
    <link href="http://davidad.github.io/blog/2014/04/14/all-boolean-functions-are-polynomials/"/>
    <updated>2014-04-14T10:47:37-04:00</updated>
    <id>http://davidad.github.io/blog/2014/04/14/all-boolean-functions-are-polynomials</id>
    <content type="html"><![CDATA[<p>…in the integers mod 2 (a.k.a. the finite field of order 2). Multiplication mod 2 is <code>AND</code>:</p>
<table>
<thead>
<tr class="header">
<th style="text-align: center;">A</th>
<th style="text-align: center;">B</th>
<th style="text-align: center;">(AB)</th>
<th style="text-align: center;">A B <code>AND</code></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">0</td>
<td style="text-align: center;">0</td>
<td style="text-align: center;">0</td>
<td style="text-align: center;">0</td>
</tr>
<tr class="even">
<td style="text-align: center;">0</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">0</td>
<td style="text-align: center;">0</td>
</tr>
<tr class="odd">
<td style="text-align: center;">1</td>
<td style="text-align: center;">0</td>
<td style="text-align: center;">0</td>
<td style="text-align: center;">0</td>
</tr>
<tr class="even">
<td style="text-align: center;">1</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">1</td>
</tr>
</tbody>
</table>
<p><br> Adding one mod 2 is <code>NOT</code>:</p>
<table>
<thead>
<tr class="header">
<th style="text-align: center;">A</th>
<th style="text-align: center;">(A+1)</th>
<th style="text-align: center;">A <code>NOT</code></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">0</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">1</td>
</tr>
<tr class="even">
<td style="text-align: center;">1</td>
<td style="text-align: center;">0</td>
<td style="text-align: center;">0</td>
</tr>
</tbody>
</table>
<p><br> So, multiplication plus one is <code>NAND</code>:</p>
<table>
<thead>
<tr class="header">
<th style="text-align: center;">A</th>
<th style="text-align: center;">B</th>
<th style="text-align: center;">(AB+1)</th>
<th style="text-align: center;">A B <code>NAND</code></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">0</td>
<td style="text-align: center;">0</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">1</td>
</tr>
<tr class="even">
<td style="text-align: center;">0</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">1</td>
</tr>
<tr class="odd">
<td style="text-align: center;">1</td>
<td style="text-align: center;">0</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">1</td>
</tr>
<tr class="even">
<td style="text-align: center;">1</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">0</td>
<td style="text-align: center;">0</td>
</tr>
</tbody>
</table>
<p><br> Since <code>NAND</code> is universal, and any finite composition of polynomials is a polynomial, any finite boolean circuit is a polynomial. Here’s all 16 two-input functions: <!-- more --></p>
<table>
<thead>
<tr class="header">
<th style="text-align: center;">Lookup table</th>
<th style="text-align: center;">Boolean function (<a href="http://en.wikipedia.org/wiki/Reverse_Polish_notation">RPN</a>)</th>
<th style="text-align: right;">Polynomial</th>
<th style="text-align: center;">Polynomial bitmap</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">0000</td>
<td style="text-align: center;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: center;">0000</td>
</tr>
<tr class="even">
<td style="text-align: center;">0001</td>
<td style="text-align: center;">A B <code>AND</code></td>
<td style="text-align: right;">AB</td>
<td style="text-align: center;">0001</td>
</tr>
<tr class="odd">
<td style="text-align: center;">0010</td>
<td style="text-align: center;">A (B <code>NOT</code>) <code>AND</code></td>
<td style="text-align: right;">AB+A</td>
<td style="text-align: center;">0101</td>
</tr>
<tr class="even">
<td style="text-align: center;">0011</td>
<td style="text-align: center;">A</td>
<td style="text-align: right;">A</td>
<td style="text-align: center;">0100</td>
</tr>
<tr class="odd">
<td style="text-align: center;">0100</td>
<td style="text-align: center;">(A <code>NOT</code>) B <code>AND</code></td>
<td style="text-align: right;">AB+B</td>
<td style="text-align: center;">0011</td>
</tr>
<tr class="even">
<td style="text-align: center;">0101</td>
<td style="text-align: center;">B</td>
<td style="text-align: right;">B</td>
<td style="text-align: center;">0010</td>
</tr>
<tr class="odd">
<td style="text-align: center;">0110</td>
<td style="text-align: center;">A B <code>XOR</code></td>
<td style="text-align: right;">A+B</td>
<td style="text-align: center;">0110</td>
</tr>
<tr class="even">
<td style="text-align: center;">0111</td>
<td style="text-align: center;">A B <code>OR</code></td>
<td style="text-align: right;">AB+A+B</td>
<td style="text-align: center;">0111</td>
</tr>
<tr class="odd">
<td style="text-align: center;">1000</td>
<td style="text-align: center;">A B <code>OR</code> <code>NOT</code></td>
<td style="text-align: right;">AB+A+B+1</td>
<td style="text-align: center;">1111</td>
</tr>
<tr class="even">
<td style="text-align: center;">1001</td>
<td style="text-align: center;">A B <code>XOR</code> <code>NOT</code></td>
<td style="text-align: right;">A+B+1</td>
<td style="text-align: center;">1110</td>
</tr>
<tr class="odd">
<td style="text-align: center;">1010</td>
<td style="text-align: center;">B <code>NOT</code></td>
<td style="text-align: right;">B+1</td>
<td style="text-align: center;">1010</td>
</tr>
<tr class="even">
<td style="text-align: center;">1011</td>
<td style="text-align: center;">A (B <code>NOT</code>) <code>OR</code></td>
<td style="text-align: right;">AB+B+1</td>
<td style="text-align: center;">1011</td>
</tr>
<tr class="odd">
<td style="text-align: center;">1100</td>
<td style="text-align: center;">A <code>NOT</code></td>
<td style="text-align: right;">A+1</td>
<td style="text-align: center;">1100</td>
</tr>
<tr class="even">
<td style="text-align: center;">1101</td>
<td style="text-align: center;">(A <code>NOT</code>) B <code>OR</code></td>
<td style="text-align: right;">AB+A+1</td>
<td style="text-align: center;">1101</td>
</tr>
<tr class="odd">
<td style="text-align: center;">1110</td>
<td style="text-align: center;">A B <code>AND</code> <code>NOT</code></td>
<td style="text-align: right;">AB+1</td>
<td style="text-align: center;">1001</td>
</tr>
<tr class="even">
<td style="text-align: center;">1111</td>
<td style="text-align: center;">1</td>
<td style="text-align: right;">1</td>
<td style="text-align: center;">1000</td>
</tr>
</tbody>
</table>
<p><br> It’s interesting that in many cases, including those corresponding to the “basic” functions of <code>AND</code>, <code>OR</code>, <code>XOR</code> and <code>NOT</code>, the polynomial bitmap is identical to the lookup table.</p>
<p>It’s also interesting that these polynomials are either multilinear (linear in each variable) or the sum of a multilinear polynomial with 1.</p>
<p>Naturally, I’m not the first person to notice this. It was first noticed <a href="http://en.wikipedia.org/wiki/Zhegalkin_polynomial">by I. I. Zhegalkin in 1927</a>. And I haven’t yet found any especially compelling uses of the representation. (If you actually want to represent boolean functions, you’re probably better served by <a href="http://crypto.stanford.edu/pbc/notes/zdd/">ZDDs</a>.) But I found it an interesting discovery which might just come in handy someday.</p>]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Getting started with nginx configuration]]></title>
    <link href="http://davidad.github.io/blog/2014/04/06/minimal-nginx-configuration/"/>
    <updated>2014-04-06T20:05:53-04:00</updated>
    <id>http://davidad.github.io/blog/2014/04/06/minimal-nginx-configuration</id>
    <content type="html"><![CDATA[<p><em>Thanks to fellow <a href="http://hackerschool.com">Hacker School</a>er <a href="http://twitter.com/leah_steinberg">Leah Steinberg</a> for inspiring this post!</em></p>
<hr />
<p>Having intermittently struggled with <code>apache2</code> configuration files for the majority of my adult life, I find <code>nginx</code> an absolute joy to set up. I’m completely sincere about that. But, for those who are just getting into Web development, <code>nginx</code> is just about as much of a struggle as Apache used to be—in fact, probably more so, because there’s less abundant learning material out there on the Internet.</p>
<p>So, here’s an attempt to make that situation just the slightest bit better.</p>
<p>If you don’t already have <code>nginx</code> installed, I encourage you to follow <a href="http://openresty.org/#Installation">these directions</a> for building <a href="http://openresty.org/">OpenResty</a>, an enhanced version of <code>nginx</code> that enables building entire Web apps within the <code>nginx</code> process using the beautiful programming language <a href="http://en.wikipedia.org/wiki/Lua_(programming_language)#Features">Lua</a>.</p>
<p>But, from here on, I’m going to assume that you already have a stock version of <code>nginx</code> installed. Verify that if you run</p>
<pre><code>$ nginx -v</code></pre>
<p>you get some kind of reasonable response, like</p>
<pre><code>nginx version: nginx/1.2.3</code></pre>
<p>Success!</p>
<p>Now, make a file called <code>hi.conf</code>:</p>
<figure class='code'><figcaption>
hi.conf
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='1' class='line-number'></div><div data-line='2' class='line-number'></div><div data-line='3' class='line-number'></div><div data-line='4' class='line-number'></div><div data-line='5' class='line-number'></div><div data-line='6' class='line-number'></div><div data-line='7' class='line-number'></div><div data-line='8' class='line-number'></div><div data-line='9' class='line-number'></div><div data-line='10' class='line-number'></div><div data-line='11' class='line-number'></div><div data-line='12' class='line-number'></div></pre></td><td class='main  nginx'><pre><div class='line'><code><span class="k">error_log</span> <span class="s">stderr</span><span class="p">;</span>
</code></div><div class='line'><code><span class="k">pid</span> <span class="s">nginx.pid</span><span class="p">;</span>
</code></div><div class='line'><code><span class="k">http</span> <span class="p">{</span>
</code></div><div class='line'><code>    <span class="kn">access_log</span> <span class="no">off</span><span class="p">;</span>
</code></div><div class='line'><code>    <span class="kn">server</span> <span class="p">{</span>
</code></div><div class='line'><code>        <span class="kn">listen</span> <span class="mi">4945</span><span class="p">;</span>
</code></div><div class='line'><code>        <span class="kn">location</span> <span class="s">/</span> <span class="p">{</span>
</code></div><div class='line'><code>            <span class="kn">return</span> <span class="mi">200</span><span class="p">;</span>
</code></div><div class='line'><code>        <span class="p">}</span>
</code></div><div class='line'><code>    <span class="p">}</span>
</code></div><div class='line'><code><span class="p">}</span>
</code></div><div class='line'><code><span class="k">events</span> <span class="p">{}</span>
</code></div></pre></td></tr></table></div></figure>

<!-- more -->

<p>I’ve chosen the number 4945 so as to hopefully not conflict with any services that may already be running on your machine for one reason or another. Now, let’s launch <code>nginx</code> using this configuration file and test it:</p>
<pre><code>$ nginx -p `pwd`/ -c hi.conf
nginx: [alert] could not open error log file: open() &quot;/var/log/nginx/error.log&quot; failed (13: Permission denied)
$ telnet localhost 4945
Trying 127.0.0.1...
Connected to localhost.
Escape character is &#39;^]&#39;.
GET / HTTP/1.0

HTTP/1.1 200 OK
Server: nginx/1.2.3
Date: Mon, 07 Apr 2014 01:50:28 GMT
Content-Type: text/plain
Content-Length: 0
Connection: close

Connection closed by foreign host.
$ kill -QUIT `cat nginx.pid`</code></pre>
<p>You’ll have to actually enter the line <code>GET / HTTP/1.0</code>. HTTP is a protocol intended for humans to be able to read and write, and you may as well take advantage of it! Of course, you could also navigate to <code>http://localhost:4945/</code> in a browser, but then all you see is a blank page, which is not quite as satisfying (to me, at least) as a <code>200 OK</code> on the terminal<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a>.</p>
<hr />
<p>What’s that? You want to actually serve data, and not just a blank page?</p>
<figure class='code'><figcaption>
hi2.conf
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='1' class='line-number'></div><div data-line='2' class='line-number'></div><div data-line='3' class='line-number'></div><div data-line='4' class='line-number'></div><div data-line='5' class='line-number'></div><div data-line='6' class='line-number'></div><div data-line='7' class='line-number'></div><div data-line='8' class='line-number'></div><div data-line='9' class='line-number'></div><div data-line='10' class='line-number'></div><div data-line='11' class='line-number'></div><div data-line='12' class='line-number'></div><div data-line='13' class='line-number'></div></pre></td><td class='main  nginx'><pre><div class='line'><code><span class="k">error_log</span> <span class="s">stderr</span><span class="p">;</span>
</code></div><div class='line'><code><span class="k">pid</span> <span class="s">nginx.pid</span><span class="p">;</span>
</code></div><div class='line'><code><span class="k">http</span> <span class="p">{</span>
</code></div><div class='line'><code>    <span class="kn">access_log</span> <span class="no">off</span><span class="p">;</span>
</code></div><div class='line'><code>    <span class="kn">root</span> <span class="s">.</span><span class="p">;</span>
</code></div><div class='line'><code>    <span class="kn">server</span> <span class="p">{</span>
</code></div><div class='line'><code>        <span class="kn">listen</span> <span class="mi">4945</span><span class="p">;</span>
</code></div><div class='line'><code>        <span class="kn">location</span> <span class="s">/</span> <span class="p">{</span>
</code></div><div class='line'><code>            <span class="kn">try_files</span> <span class="s">/index.html</span> <span class="p">=</span><span class="mi">404</span><span class="p">;</span>
</code></div><div class='line'><code>        <span class="p">}</span>
</code></div><div class='line'><code>    <span class="p">}</span>
</code></div><div class='line'><code><span class="p">}</span>
</code></div><div class='line'><code><span class="k">events</span> <span class="p">{}</span>
</code></div></pre></td></tr></table></div></figure>


<p>Then just drop an <code>index.html</code> into the same folder as <code>hi2.conf</code> and run</p>
<pre><code>$ nginx -p `pwd`/ -c hi2.conf</code></pre>
<p>Now you should be able to load <code>http://localhost:4945/</code> and see what you wrote in <code>index.html</code>. Exciting!</p>
<h2 id="next-steps">Next Steps</h2>
<p>If you installed OpenResty, continue with their <a href="http://openresty.org/#GettingStarted">Getting Started</a>. Otherwise, I’ll leave you to other tutorials, or to <a href="http://nginx.org/en/docs/dirindex.html">the actual <code>nginx</code> documentation</a> – this was really just an exercise in getting something to work. But, I will offer this advice: I recommend against using any of your OS’s magic, like special files and folders where things are supposed to be put, or special incantations for invoking <code>nginx</code>. Just run <code>nginx</code> on the command line. It’s a smart enough program to <strong>stay</strong> running once you’ve started it, without the help of external infrastructure, and I think you’ll be much less frustrated working with it directly, having all the relevant files in one project directory, than struggling to configure both <code>nginx</code> itself and your OS’s favorite mechanism for managing server processes. Once you’ve figured out how to disable the OS’s auto-server-starting mechanisms, you can modify the <code>listen</code> line to <code>listen 80</code> so you can stop typing that pesky <code>:4945</code> in the browser.</p>
<h3 id="reloading">Reloading</h3>
<p>Oh, and one last trick: if you want to ask <code>nginx</code> to reload its configuration file without actually bringing down the server, just</p>
<pre><code>$ kill -HUP `cat nginx.pid`</code></pre>
<p>Happy hacking!</p>
<section class="footnotes">
<hr />
<ol>
<li id="fn1"><p>200 is the HTTP status code meaning “OK”, the status that accompanies most successful HTTP replies on the Web. As you might guess, that’s the same 200 referred to by the line <code>return 200</code> in <code>hi.conf</code>.<a href="#fnref1">↩</a></p></li>
</ol>
</section>]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[VNC as a graphical interface medium]]></title>
    <link href="http://davidad.github.io/blog/2014/03/30/vnc-as-an-interface/"/>
    <updated>2014-03-30T19:21:34-04:00</updated>
    <id>http://davidad.github.io/blog/2014/03/30/vnc-as-an-interface</id>
    <content type="html"><![CDATA[<p>The <a href="http://en.wikipedia.org/wiki/Virtual_Network_Computing">Virtual Network Computing (VNC)</a> system for accessing the GUI environments of remote computers uses a protocol called <a href="http://en.wikipedia.org/wiki/RFB_protocol">Remote Frame Buffer (RFB)</a> to exchange data about graphics output as well as keyboard and mouse input. RFB turns out to be a very sane protocol (specification PDF <a href="http://www.realvnc.com/docs/rfbproto.pdf">here</a>) compared with X11, and infinitely more sane than Cocoa (which requires the ObjC runtime) or Win32 (no explanation needed). So, I thought, why not just expose a program’s graphical interface as a VNC server? Then we can let a VNC client deal with the vagaries of the host windowing environment, and we only need to speak a well-specified protocol on a socket.</p>
<p>So far, this is what I have to show (<a href="https://github.com/davidad/vnchacks">code on github</a>):<br /><img src="http://davidad.github.io/assets/color_rotate.gif" /></p>
<!-- more -->

<p>This also turned out to be a good exercise in both raw socket programming and the use of <a href="http://www.zlib.net"><code>zlib</code></a> (the <a href="http://en.wikipedia.org/wiki/DEFLATE">DEFLATE</a> compression library), both of which I’ve skirted around before but never actually done directly in C<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a>. Check out my <code>open_port</code> function:</p>
<figure class='code'><figcaption>
color_rotate_zrle.c<a href='https://github.com/davidad/vnchacks/blob/TJ-3/color_rotate_zrle.c#L14-23'>context</a>
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='14' class='line-number'></div><div data-line='15' class='line-number'></div><div data-line='16' class='line-number'></div><div data-line='17' class='line-number'></div><div data-line='18' class='line-number'></div><div data-line='19' class='line-number'></div><div data-line='20' class='line-number'></div><div data-line='21' class='line-number'></div><div data-line='22' class='line-number'></div><div data-line='23' class='line-number'></div></pre></td><td class='main  c'><pre><div class='line'><code><span class="kt">int</span> <span class="nf">open_port</span><span class="p">(</span><span class="kt">uint16_t</span> <span class="n">port</span><span class="p">)</span> <span class="p">{</span>
</code></div><div class='line'><code>  <span class="kt">int</span> <span class="n">connfd</span><span class="p">,</span> <span class="n">sockfd</span><span class="p">,</span> <span class="n">y</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">=</span><span class="p">{</span><span class="mi">1</span><span class="p">};</span>
</code></div><div class='line'><code>  <span class="k">struct</span> <span class="n">sockaddr_in</span> <span class="n">addr</span> <span class="o">=</span> <span class="p">{.</span><span class="n">sin_family</span><span class="o">=</span><span class="n">AF_INET</span><span class="p">,.</span><span class="n">sin_port</span><span class="o">=</span><span class="n">htons</span><span class="p">(</span><span class="n">port</span><span class="p">),.</span><span class="n">sin_addr</span><span class="o">=</span><span class="p">{.</span><span class="n">s_addr</span><span class="o">=</span><span class="n">htonl</span><span class="p">(</span><span class="n">INADDR_ANY</span><span class="p">)}};</span>
</code></div><div class='line'><code>  <span class="k">if</span><span class="p">(</span> <span class="p">(</span> <span class="n">sockfd</span> <span class="o">=</span> <span class="n">socket</span><span class="p">(</span><span class="n">PF_INET</span><span class="p">,</span> <span class="n">SOCK_STREAM</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>                         <span class="p">)</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span>  <span class="n">perror</span><span class="p">(</span>  <span class="s">&quot;socket&quot;</span>  <span class="p">);</span>
</code></div><div class='line'><code>  <span class="k">if</span><span class="p">(</span> <span class="p">(</span>      <span class="n">setsockopt</span><span class="p">(</span><span class="n">sockfd</span><span class="p">,</span> <span class="n">SOL_SOCKET</span><span class="p">,</span> <span class="n">SO_REUSEADDR</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">int</span><span class="p">)))</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span>  <span class="n">perror</span><span class="p">(</span><span class="s">&quot;setsockopt&quot;</span><span class="p">);</span>
</code></div><div class='line'><code>  <span class="k">if</span><span class="p">(</span> <span class="p">(</span>            <span class="n">bind</span><span class="p">(</span><span class="n">sockfd</span><span class="p">,</span> <span class="p">(</span><span class="k">struct</span> <span class="n">sockaddr</span><span class="o">&ast;</span><span class="p">)</span><span class="o">&amp;</span><span class="n">addr</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">addr</span><span class="p">))</span>   <span class="p">)</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span>  <span class="n">perror</span><span class="p">(</span>   <span class="s">&quot;bind&quot;</span>   <span class="p">);</span>
</code></div><div class='line'><code>  <span class="k">if</span><span class="p">(</span> <span class="p">(</span>          <span class="n">listen</span><span class="p">(</span><span class="n">sockfd</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>                                       <span class="p">)</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span>  <span class="n">perror</span><span class="p">(</span>  <span class="s">&quot;listen&quot;</span>  <span class="p">);</span>
</code></div><div class='line'><code>  <span class="k">if</span><span class="p">(</span> <span class="p">(</span> <span class="n">connfd</span> <span class="o">=</span> <span class="n">accept</span><span class="p">(</span><span class="n">sockfd</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>                                 <span class="p">)</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span>  <span class="n">perror</span><span class="p">(</span>  <span class="s">&quot;accept&quot;</span>  <span class="p">);</span>
</code></div><div class='line'><code>  <span class="k">return</span> <span class="n">connfd</span><span class="p">;</span>
</code></div><div class='line'><code><span class="p">}</span>
</code></div></pre></td></tr></table></div></figure>


<p>Once the socket connection is established, there’s some handshaking to do (as you can see, this is pretty stubby — it doesn’t wait for any messages from the client):</p>
<figure class='code'><figcaption>
color_rotate_zrle.c<a href='https://github.com/davidad/vnchacks/blob/TJ-3/color_rotate_zrle.c#L53-59'>context</a>
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='53' class='line-number'></div><div data-line='54' class='line-number'></div><div data-line='55' class='line-number'></div><div data-line='56' class='line-number'></div><div data-line='57' class='line-number'></div><div data-line='58' class='line-number'></div><div data-line='59' class='line-number'></div></pre></td><td class='main  c'><pre><div class='line'><code>  <span class="kt">int</span>   <span class="n">connfd</span> <span class="o">=</span> <span class="n">open_port</span><span class="p">(</span><span class="n">PORT</span><span class="p">);</span>
</code></div><div class='line'><code>  <span class="n">write</span><span class="p">(</span><span class="n">connfd</span><span class="p">,</span> <span class="n">protover</span><span class="p">,</span>          <span class="k">sizeof</span><span class="p">(</span><span class="n">protover</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
</code></div><div class='line'><code>  <span class="n">write</span><span class="p">(</span><span class="n">connfd</span><span class="p">,</span> <span class="n">securitytype</span><span class="p">,</span>      <span class="k">sizeof</span><span class="p">(</span><span class="n">securitytype</span><span class="p">));</span>
</code></div><div class='line'><code>  <span class="n">write</span><span class="p">(</span><span class="n">connfd</span><span class="p">,</span> <span class="n">securitychallenge</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">securitychallenge</span><span class="p">));</span>
</code></div><div class='line'><code>  <span class="n">write</span><span class="p">(</span><span class="n">connfd</span><span class="p">,</span> <span class="n">securityresult</span><span class="p">,</span>    <span class="k">sizeof</span><span class="p">(</span><span class="n">securityresult</span><span class="p">));</span>
</code></div><div class='line'><code>  <span class="n">write</span><span class="p">(</span><span class="n">connfd</span><span class="p">,</span> <span class="n">serverInit</span><span class="p">,</span>        <span class="k">sizeof</span><span class="p">(</span><span class="n">serverInit</span><span class="p">));</span>
</code></div><div class='line'><code>  <span class="n">write</span><span class="p">(</span><span class="n">connfd</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span>              <span class="k">sizeof</span><span class="p">(</span><span class="n">name</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
</code></div></pre></td></tr></table></div></figure>


<p>Then, we can get down to business:</p>
<figure class='code'><figcaption>
color_rotate_zrle.c<a href='https://github.com/davidad/vnchacks/blob/TJ-3/color_rotate_zrle.c#L61-85'>context</a>
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='61' class='line-number'></div><div data-line='62' class='line-number'></div><div data-line='63' class='line-number'></div><div data-line='64' class='line-number'></div><div data-line='65' class='line-number'></div><div data-line='66' class='line-number'></div><div data-line='67' class='line-number'></div><div data-line='68' class='line-number'></div><div data-line='69' class='line-number'></div><div data-line='70' class='line-number'></div><div data-line='71' class='line-number'></div><div data-line='72' class='line-number'></div><div data-line='73' class='line-number'></div><div data-line='74' class='line-number'></div><div data-line='75' class='line-number'></div><div data-line='76' class='line-number'></div><div data-line='77' class='line-number'></div><div data-line='78' class='line-number'></div><div data-line='79' class='line-number'></div><div data-line='80' class='line-number'></div><div data-line='81' class='line-number'></div><div data-line='82' class='line-number'></div><div data-line='83' class='line-number'></div><div data-line='84' class='line-number'></div><div data-line='85' class='line-number'></div></pre></td><td class='main  c'><pre><div class='line'><code>  <span class="n">z_streamp</span> <span class="n">z</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="n">z_stream</span><span class="p">));</span>
</code></div><div class='line'><code>  <span class="n">deflateInit</span><span class="p">(</span><span class="n">z</span><span class="p">,</span><span class="mi">6</span><span class="p">);</span>
</code></div><div class='line'><code>  <span class="kt">uint8_t</span><span class="o">&ast;</span> <span class="n">buf</span><span class="o">=</span><span class="n">malloc</span><span class="p">(</span><span class="n">FBUFZ</span><span class="p">);</span>
</code></div><div class='line'><code>  <span class="kt">uint8_t</span> <span class="n">tile</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span><span class="mh">0x01</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">255</span><span class="p">};</span> <span class="c1">//solid blue</span>
</code></div><div class='line'><code>  <span class="k">const</span> <span class="kt">int</span> <span class="n">frame_size</span><span class="o">=</span><span class="k">sizeof</span><span class="p">(</span><span class="n">tile</span><span class="p">)</span><span class="o">&ast;</span><span class="p">(</span><span class="n">width</span><span class="o">/</span><span class="mi">64</span><span class="p">)</span><span class="o">&ast;</span><span class="p">(</span><span class="n">height</span><span class="o">/</span><span class="mi">64</span><span class="p">);</span>
</code></div><div class='line'><code>  <span class="kt">uint8_t</span><span class="o">&ast;</span> <span class="n">frame</span><span class="o">=</span><span class="n">malloc</span><span class="p">(</span><span class="n">frame_size</span><span class="p">);</span>
</code></div><div class='line'><code>  <span class="kt">int</span> <span class="n">t</span><span class="p">;</span>
</code></div><div class='line'><code>  <span class="kt">double</span> <span class="n">h</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">l</span><span class="o">=</span><span class="mf">0.5</span><span class="p">;</span>
</code></div><div class='line'><code>  <span class="k">while</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
</code></div><div class='line'><code>    <span class="n">hcl2pix</span><span class="p">(</span><span class="o">&amp;</span><span class="n">tile</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="n">h</span><span class="p">,</span><span class="n">c</span><span class="p">,</span><span class="n">l</span><span class="p">);</span>
</code></div><div class='line'><code>    <span class="n">h</span><span class="o">+=</span><span class="mf">0.01</span><span class="p">;</span>
</code></div><div class='line'><code>    <span class="k">for</span><span class="p">(</span><span class="n">t</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="n">t</span><span class="o">&lt;</span><span class="p">(</span><span class="n">width</span><span class="o">/</span><span class="mi">64</span><span class="p">)</span><span class="o">&ast;</span><span class="p">(</span><span class="n">height</span><span class="o">/</span><span class="mi">64</span><span class="p">);</span><span class="n">t</span><span class="o">++</span><span class="p">)</span>
</code></div><div class='line'><code>      <span class="n">memcpy</span><span class="p">(</span><span class="o">&amp;</span><span class="n">frame</span><span class="p">[</span><span class="n">t</span><span class="o">&ast;</span><span class="k">sizeof</span><span class="p">(</span><span class="n">tile</span><span class="p">)],</span><span class="n">tile</span><span class="p">,</span><span class="k">sizeof</span><span class="p">(</span><span class="n">tile</span><span class="p">));</span>
</code></div><div class='line'><code>    <span class="n">z</span><span class="o">-&gt;</span><span class="n">next_in</span><span class="o">=</span><span class="n">frame</span><span class="p">;</span>
</code></div><div class='line'><code>    <span class="n">z</span><span class="o">-&gt;</span><span class="n">avail_in</span><span class="o">=</span><span class="n">frame_size</span><span class="p">;</span>
</code></div><div class='line'><code>    <span class="n">z</span><span class="o">-&gt;</span><span class="n">next_out</span><span class="o">=</span><span class="n">buf</span><span class="p">;</span>
</code></div><div class='line'><code>    <span class="n">z</span><span class="o">-&gt;</span><span class="n">avail_out</span><span class="o">=</span><span class="n">FBUFZ</span><span class="p">;</span>
</code></div><div class='line'><code>    <span class="n">z</span><span class="o">-&gt;</span><span class="n">total_out</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span>
</code></div><div class='line'><code>    <span class="n">deflate</span><span class="p">(</span><span class="n">z</span><span class="p">,</span><span class="n">Z_SYNC_FLUSH</span><span class="p">);</span>
</code></div><div class='line'><code>    <span class="kt">int</span> <span class="n">length</span> <span class="o">=</span> <span class="n">htonl</span><span class="p">(</span><span class="n">z</span><span class="o">-&gt;</span><span class="n">total_out</span><span class="p">);</span>
</code></div><div class='line'><code>    <span class="n">write</span><span class="p">(</span><span class="n">connfd</span><span class="p">,</span><span class="n">fbuf_refresh</span><span class="p">,</span><span class="k">sizeof</span><span class="p">(</span><span class="n">fbuf_refresh</span><span class="p">));</span>
</code></div><div class='line'><code>    <span class="n">write</span><span class="p">(</span><span class="n">connfd</span><span class="p">,</span><span class="o">&amp;</span><span class="n">length</span><span class="p">,</span><span class="mi">4</span><span class="p">);</span>
</code></div><div class='line'><code>    <span class="n">write</span><span class="p">(</span><span class="n">connfd</span><span class="p">,</span><span class="n">buf</span><span class="p">,</span><span class="n">z</span><span class="o">-&gt;</span><span class="n">total_out</span><span class="p">);</span>
</code></div><div class='line'><code>    <span class="n">usleep</span><span class="p">(</span><span class="mf">1e6</span><span class="o">/</span><span class="mi">30</span><span class="p">);</span>
</code></div><div class='line'><code>  <span class="p">}</span>
</code></div></pre></td></tr></table></div></figure>


<p>I’ve chosen to implement the encoding scheme ZRLE here, but most VNC clients will also support streaming raw pixel data, which would remove the dependency on <code>zlib</code> and simplify the logic somewhat<a href="#fn2" class="footnoteRef" id="fnref2"><sup>2</sup></a>. In the ZRLE encoding, the display area is split into 64x64-pixel “tiles”, each of which can be described in a variety of palletized and non-paletized encodings. The simplest — the one we’re using here — is the one-color palette, introduced by <code>0x01</code>, and containing simply the one color (no further data is needed, since it’s implied that every pixel in the tile is that color). So, in our main display loop, we first update the tile (the <code>hcl2pix</code> function is one of my own devising, which you can find in <a href="https://github.com/davidad/vnchacks/blob/master/colorspaces.c"><code>colorspaces.c</code></a>), then copy the (64x64) tile as many times as necessary to make a complete frame, then <code>deflate</code> it, and finally write it out to the socket and wait until it’s time for the next frame. That’s the essence of the program right there.</p>
<p>You may also be interested in the details of the RFB message formats:</p>
<figure class='code'><figcaption>
color_rotate_zrle.c<a href='https://github.com/davidad/vnchacks/blob/TJ-3/color_rotate_zrle.c#L25-51'>context</a>
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='25' class='line-number'></div><div data-line='26' class='line-number'></div><div data-line='27' class='line-number'></div><div data-line='28' class='line-number'></div><div data-line='29' class='line-number'></div><div data-line='30' class='line-number'></div><div data-line='31' class='line-number'></div><div data-line='32' class='line-number'></div><div data-line='33' class='line-number'></div><div data-line='34' class='line-number'></div><div data-line='35' class='line-number'></div><div data-line='36' class='line-number'></div><div data-line='37' class='line-number'></div><div data-line='38' class='line-number'></div><div data-line='39' class='line-number'></div><div data-line='40' class='line-number'></div><div data-line='41' class='line-number'></div><div data-line='42' class='line-number'></div><div data-line='43' class='line-number'></div><div data-line='44' class='line-number'></div><div data-line='45' class='line-number'></div><div data-line='46' class='line-number'></div><div data-line='47' class='line-number'></div><div data-line='48' class='line-number'></div><div data-line='49' class='line-number'></div><div data-line='50' class='line-number'></div><div data-line='51' class='line-number'></div></pre></td><td class='main  c'><pre><div class='line'><code><span class="k">const</span> <span class="kt">char</span> <span class="n">protover</span><span class="p">[]</span> <span class="o">=</span> <span class="s">&quot;RFB 003.003</span><span class="se">&#92;n</span><span class="s">&quot;</span><span class="p">;</span>
</code></div><div class='line'><code><span class="k">const</span> <span class="kt">char</span> <span class="n">securitytype</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span><span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x02</span><span class="p">};</span>
</code></div><div class='line'><code><span class="k">const</span> <span class="kt">char</span> <span class="n">securitychallenge</span><span class="p">[</span><span class="mi">16</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="mh">0xaa</span><span class="p">};</span>
</code></div><div class='line'><code><span class="k">const</span> <span class="kt">char</span> <span class="n">securityresult</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="mi">0</span><span class="p">};</span>
</code></div><div class='line'><code><span class="k">const</span> <span class="kt">char</span> <span class="n">name</span><span class="p">[]</span> <span class="o">=</span> <span class="s">&quot;hello!&quot;</span><span class="p">;</span>
</code></div><div class='line'><code><span class="k">const</span> <span class="kt">uint16_t</span> <span class="n">width</span><span class="o">=</span><span class="mi">1024</span><span class="p">,</span> <span class="n">height</span><span class="o">=</span><span class="mi">1024</span><span class="p">;</span>
</code></div><div class='line'><code> </code></div><div class='line'><code><span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
</code></div><div class='line'><code>  <span class="k">const</span> <span class="kt">char</span> <span class="n">serverInit</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
</code></div><div class='line'><code>    <span class="cm">/&ast;frame size&ast;/</span>   <span class="n">width</span><span class="o">&gt;&gt;</span><span class="mi">8</span><span class="p">,</span> <span class="n">width</span><span class="o">&amp;</span><span class="mh">0xff</span><span class="p">,</span> <span class="n">height</span><span class="o">&gt;&gt;</span><span class="mi">8</span><span class="p">,</span> <span class="n">height</span><span class="o">&amp;</span><span class="mh">0xff</span><span class="p">,</span>
</code></div><div class='line'><code>    <span class="cm">/&ast;bpp&ast;/</span> <span class="mi">32</span><span class="p">,</span> <span class="cm">/&ast;depth&ast;/</span> <span class="mi">24</span><span class="p">,</span> <span class="cm">/&ast;big-endian&ast;/</span> <span class="mi">0</span><span class="p">,</span> <span class="cm">/&ast;true-colour&ast;/</span> <span class="mi">1</span><span class="p">,</span>
</code></div><div class='line'><code>    <span class="cm">/&ast;red mask&ast;/</span>     <span class="mi">0</span><span class="p">,</span> <span class="mh">0xff</span><span class="p">,</span>
</code></div><div class='line'><code>    <span class="cm">/&ast;green mask&ast;/</span>   <span class="mi">0</span><span class="p">,</span> <span class="mh">0xff</span><span class="p">,</span>
</code></div><div class='line'><code>    <span class="cm">/&ast;blue mask&ast;/</span>    <span class="mi">0</span><span class="p">,</span> <span class="mh">0xff</span><span class="p">,</span>
</code></div><div class='line'><code>    <span class="cm">/&ast;red shift&ast;/</span>    <span class="mi">0</span><span class="p">,</span>
</code></div><div class='line'><code>    <span class="cm">/&ast;green shift&ast;/</span>  <span class="mi">8</span><span class="p">,</span>
</code></div><div class='line'><code>    <span class="cm">/&ast;blue shift&ast;/</span>  <span class="mi">16</span><span class="p">,</span> <span class="cm">/&ast;padding&ast;/</span> <span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span>
</code></div><div class='line'><code>    <span class="cm">/&ast;name length&ast;/</span>  <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">name</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span> <span class="p">};</span>
</code></div><div class='line'><code>  <span class="k">const</span> <span class="kt">char</span> <span class="n">fbuf_refresh</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
</code></div><div class='line'><code>    <span class="cm">/&ast;message-type&ast;/</span> <span class="mi">0</span><span class="p">,</span>
</code></div><div class='line'><code>    <span class="cm">/&ast;padding&ast;/</span>      <span class="mi">0</span><span class="p">,</span>
</code></div><div class='line'><code>    <span class="cm">/&ast;nrects&ast;/</span>       <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span>
</code></div><div class='line'><code>    <span class="cm">/&ast;xpos&ast;/</span>         <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span>
</code></div><div class='line'><code>    <span class="cm">/&ast;ypos&ast;/</span>         <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span>
</code></div><div class='line'><code>    <span class="cm">/&ast;width&ast;/</span>        <span class="n">width</span><span class="o">&gt;&gt;</span><span class="mi">8</span><span class="p">,</span> <span class="n">width</span><span class="o">&amp;</span><span class="mh">0xff</span><span class="p">,</span>
</code></div><div class='line'><code>    <span class="cm">/&ast;height&ast;/</span>       <span class="n">height</span><span class="o">&gt;&gt;</span><span class="mi">8</span><span class="p">,</span> <span class="n">height</span><span class="o">&amp;</span><span class="mh">0xff</span><span class="p">,</span>
</code></div><div class='line'><code>    <span class="cm">/&ast;encoding-type&ast;/</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">16</span> <span class="p">};</span>
</code></div></pre></td></tr></table></div></figure>


<p>Future work includes:</p>
<ul>
<li>Splitting out the frame encoding process to a <code>send_rect</code> function</li>
<li>Actually parsing messages from the VNC client</li>
<li>Providing user input handlers</li>
<li>Comparing to an <a href="http://www.libsdl.org">SDL</a> backend: same <code>send_rect</code> and <code>register_handler</code> abstractions might be nearly as easy to implement</li>
<li>Implementing a box model to route user input to interface elements</li>
<li>Implementing font rendering with <a href="http://www.freetype.org/">FreeType</a></li>
<li>Implementing <a href="http://www.ctex.org/documents/shredder/src/texbook.pdf">TeX</a>+<a href="http://ftp.math.purdue.edu/mirrors/ctan.org/graphics/pgf/base/doc/generic/pgf/pgfmanual.pdf">TikZ</a> style graphics (big job)</li>
<li>Creating useful interface elements for this platform</li>
</ul>
<section class="footnotes">
<hr />
<ol>
<li id="fn1"><p>Yes, I did this in C. Almost every operation in the program is a function call, following the C calling convention, so it really wouldn’t be fun to do in assembly.<a href="#fnref1">↩</a></p></li>
<li id="fn2"><p>Why did I choose ZRLE, then? Well, partly because I thought it was cool, and partly because I wanted to get some practice using <code>zlib</code>. But mostly because Apple’s “Screen Sharing” VNC client advertises ZRLE as one of few standard RFB encodings it accepts. Yet, this code as it is still doesn’t work with Screen Sharing. I wound up testing it with <a href="http://sourceforge.net/projects/chicken/">Chicken</a> instead.<a href="#fnref2">↩</a></p></li>
</ol>
</section>]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Concurrency Primitives in Intel 64 Assembly]]></title>
    <link href="http://davidad.github.io/blog/2014/03/23/concurrency-primitives-in-intel-64-assembly/"/>
    <updated>2014-03-23T20:36:47-04:00</updated>
    <id>http://davidad.github.io/blog/2014/03/23/concurrency-primitives-in-intel-64-assembly</id>
    <content type="html"><![CDATA[<p>Now that nearly every computer has some form of multi-processing (that is, multiple CPUs sharing a single address space), some high-level languages are starting to get attention for their concurrency features. Many languages refer to such features as “concurrency primitives.” But since these are high-level languages, we know that these “primitives” must ultimately be implemented with hardware operations. Older high-level languages, like C, don’t have baked-in support for such operations – not because such languages are lower-level, but simply because the operations in question <em>weren’t a thing</em> when C was invented. Assembly language, being up to date with the latest CPU capabilities by definition<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a>, should provide the best window into the true nature of today’s concurrency operations.</p>
<p>In this post I’m going to walk you through a (relatively) simple concurrent assembly program which runs on OSX or Linux. Here’s the demo (<a href="https://github.com/davidad/asm_concurrency">github</a>):</p>
<pre><code>bash-3.2$ time ./concurrency-noprint-x1 foo    # single-worker version

real  0m1.458s
user  0m1.445s
sys   0m0.010s
bash-3.2$ # now run two at once
bash-3.2$ time ./concurrency-noprint-x1 foo-2 &amp; ./concurrency-noprint-x1 foo-2
[1] 71366

real  0m0.785s
user  0m0.780s
sys   0m0.001s
[1]+  Done                    time ./concurrency-noprint-x1 foo-2
bash-3.2$ time ./concurrency-noprint-x4 foo-3  # four-worker version

real  0m0.417s
user  0m0.413s
sys   0m0.003s
bash-3.2$ time ./concurrency-noprint-x7 foo-4  # seven-worker version

real  0m0.295s
user  0m0.283s
sys   0m0.001s
bash-3.2$ diff -s --from-file=foo foo-*
Files foo and foo-2 are identical
Files foo and foo-3 are identical
Files foo and foo-4 are identical</code></pre>
<!-- more -->

<p>What the program actually does is a pretty useless but computationally nontrivial and easily parallelizable task: taking the offset of each byte from the start of the buffer to the 65537th power mod 235, and storing that value back to each byte. Since it’s mod-235, the output should repeat itself every 235 bytes:</p>
<pre><code>bash-3.2$ hexdump -e &#39;235/1 &quot;%4u&quot; &quot;\n&quot;&#39; -s8 foo
   1  37 158 194  35 206  32 128  54 120 146 102 233   9 125  36 122 118 184 210 121 232  33  14  50 161  72  98 164 160  11 157  38  49 180 136 162 228 154  15  76  12  88 124  10  46  47  48  84 205   6  82  18  79 175 101 167 193 149  45  56 172  83 169 165 231  22 168  44  80  61  97 208 119 145 211 207  58 204  85  96 227 183 209  40 201  62 123  59 135 171  57  93  94  95 131  17  53 129  65 126 222 148 214   5 196  92 103 219 130 216 212  43  69 215  91 127 108 144  20 166 192  23  19 105  16 132 143  39 230  21  87  13 109 170 106 182 218 104 140 141 142 178  64 100 176 112 173  34 195  26  52   8 139 150  31 177  28  24  90 116  27 138 174 155 191  67 213   4  70  66 152  63 179 190  86  42  68 134  60 156 217 153 229  30 151 187 188 189 225 111 147 223 159 220  81   7  73  99  55 186 197  78 224  75  71 137 163  74 185 221 202   3 114  25  51 117 113 199 110 226   2 133  89 115 181 107 203  29 200  41  77 198 234   0
*</code></pre>
<p>Here, I’m asking <code>hexdump</code> to display this binary file in lines of 235 bytes each, one byte at a time, giving each byte 4 characters field-width and printing it as an unsigned integer (in decimal), with a newline at the end of the line, starting from offset 8 (as the first 8 bytes of the file are used by the concurrency mechanism for bookkeeping purposes<a href="#fn2" class="footnoteRef" id="fnref2"><sup>2</sup></a>). The <code>*</code> on the second line of <code>hexdump</code>’s output means “every line after this matches it,” so the file must repeat itself every 235 bytes until the end. We can suppress the <code>*</code> with <code>-v</code> and examine the last 4 lines, just to be sure we understand it correctly:</p>
<pre><code>bash-3.2$ hexdump -e &#39;235/1 &quot;%4u&quot; &quot;\n&quot;&#39; -s8 -v foo | tail -n4
   1  37 158 194  35 206  32 128  54 120 146 102 233   9 125  36 122 118 184 210 121 232  33  14  50 161  72  98 164 160  11 157  38  49 180 136 162 228 154  15  76  12  88 124  10  46  47  48  84 205   6  82  18  79 175 101 167 193 149  45  56 172  83 169 165 231  22 168  44  80  61  97 208 119 145 211 207  58 204  85  96 227 183 209  40 201  62 123  59 135 171  57  93  94  95 131  17  53 129  65 126 222 148 214   5 196  92 103 219 130 216 212  43  69 215  91 127 108 144  20 166 192  23  19 105  16 132 143  39 230  21  87  13 109 170 106 182 218 104 140 141 142 178  64 100 176 112 173  34 195  26  52   8 139 150  31 177  28  24  90 116  27 138 174 155 191  67 213   4  70  66 152  63 179 190  86  42  68 134  60 156 217 153 229  30 151 187 188 189 225 111 147 223 159 220  81   7  73  99  55 186 197  78 224  75  71 137 163  74 185 221 202   3 114  25  51 117 113 199 110 226   2 133  89 115 181 107 203  29 200  41  77 198 234   0
   1  37 158 194  35 206  32 128  54 120 146 102 233   9 125  36 122 118 184 210 121 232  33  14  50 161  72  98 164 160  11 157  38  49 180 136 162 228 154  15  76  12  88 124  10  46  47  48  84 205   6  82  18  79 175 101 167 193 149  45  56 172  83 169 165 231  22 168  44  80  61  97 208 119 145 211 207  58 204  85  96 227 183 209  40 201  62 123  59 135 171  57  93  94  95 131  17  53 129  65 126 222 148 214   5 196  92 103 219 130 216 212  43  69 215  91 127 108 144  20 166 192  23  19 105  16 132 143  39 230  21  87  13 109 170 106 182 218 104 140 141 142 178  64 100 176 112 173  34 195  26  52   8 139 150  31 177  28  24  90 116  27 138 174 155 191  67 213   4  70  66 152  63 179 190  86  42  68 134  60 156 217 153 229  30 151 187 188 189 225 111 147 223 159 220  81   7  73  99  55 186 197  78 224  75  71 137 163  74 185 221 202   3 114  25  51 117 113 199 110 226   2 133  89 115 181 107 203  29 200  41  77 198 234   0
   1  37 158 194  35 206  32 128  54 120 146 102 233   9 125  36 122 118 184 210 121 232  33  14  50 161  72  98 164 160  11 157  38  49 180 136 162 228 154  15  76  12  88 124  10  46  47  48  84 205   6  82  18  79 175 101 167 193 149  45  56 172  83 169 165 231  22 168  44  80  61  97 208 119 145 211 207  58 204  85  96 227 183 209  40 201  62 123  59 135 171  57  93  94  95 131  17  53 129  65 126 222 148 214   5 196  92 103 219 130 216 212  43  69 215  91 127 108 144  20 166 192  23  19 105  16 132 143  39 230  21  87  13 109 170 106 182 218 104 140 141 142 178  64 100 176 112 173  34 195  26  52   8 139 150  31 177  28  24  90 116  27 138 174 155 191  67 213   4  70  66 152  63 179 190  86  42  68 134  60 156 217 153 229  30 151 187 188 189 225 111 147 223 159 220  81   7  73  99  55 186 197  78 224  75  71 137 163  74 185 221 202   3 114  25  51 117 113 199 110 226   2 133  89 115 181 107 203  29 200  41  77 198 234   0
   1  37 158 194  35 206  32 128  54 120 146 102 233   9 125  36 122 118 184 210 121 232  33  14  50 161  72  98 164 160  11 157  38  49 180 136 162 228 154  15  76  12  88 124  10  46  47  48  84 205   6  82  18  79 175 101 167 193 149  45  56 172  83 169 165 231  22 168  44  80  61  97 208 119 145 211 207  58 204  85  96 227 183 209  40 201  62 123  59 135 171  57  93  94  95 131  17  53 129  65 126 222 148 214   5 196  92 103 219 130 216 212  43  69 215  91 127 108 144  20 166 192  23  19 105  16 132 143  39 230</code></pre>
<p>Notice that it doesn’t have an even multiple of 235 bytes – if you scroll all the way over, you’ll see that the very last line ends in the middle. That’s because this file isn’t generated by printing a particular 235-byte sequence in a loop. Rather, every <a name="task-size"></a>8-byte machine word is computed separately; the 235-byte repeating structure is built into the nature of the problem the program solves (which I chose, in part, so that it’s easy to check whether the results are sensible).</p>
<p><a name="critical-sections"></a></p>
<h2 id="critical-sections">Critical Sections <a href="#critical-sections">#</a></h2>
<p>Let’s begin with a conceptual overview of the problem concurrency primitives are supposed to address<a href="#fn3" class="footnoteRef" id="fnref3"><sup>3</sup></a>. When we have a single process operating in an address space (that it, a non-concurrent process), we can reason about the state of the entire address space at a particular point during the execution of the process. We can make statements like “this variable must be positive because we checked that it was positive four lines of code ago and we haven’t changed it since then.” In a concurrent process, a lot of this reasoning goes out the window, because our process’s sibling might have set the variable to <code>-1</code> between here and there. We just have no way of knowing – the possible state transitions are too various to justify strong claims about. Claims like “the program produces correct output” tend to be very strong indeed in this context, and we often want to make such claims (at least to ourselves).</p>
<p>So, much like <a href="http://davidad.github.io/blog/2014/03/16/infosec-the-product-design-correspondence/">security is, in a sense, all about limiting features</a>, concurrency primitives are means to restrict the possible state transitions of memory shared by multiple processors. The most common bugaboo is that a shared-memory state could have been changed by a sibling between the time that we measure it and the time that we take action based upon that measurement. So, as a general rule, the most basic concurrency operations <strong>measure a shared state</strong> and then use the data to <strong>change that shared state</strong>, while <strong>excluding siblings</strong> from accessing it throughout the whole operation. A region of code like this – where siblings are not allowed, during its execution, to access a particular memory location – is called a <strong>critical section</strong>.</p>
<p>On all Intel CPUs prior to Haswell (which started shipping last year; I don’t have one yet), “actual” critical sections are limited to <strong>single machine instructions</strong> with <strong>a <code>lock</code> prefix</strong>; larger critical sections can be emulated based on these single-instruction primitives. We’ll be doing a variation of this today.</p>
<p><a name="tasks-and-workers"></a></p>
<h2 id="tasks-and-workers">Tasks and Workers <a href="#tasks-and-workers">#</a></h2>
<p>I don’t know of any particular argument to justify the tasks-and-workers perspective on parallel computing, but in practice, it’s the one I’ve found most useful for organizing my code, and it seems to be fairly common. The idea is this: we divide our program’s workload into <strong>tasks</strong> of some granularity, and each task is picked up and operated on by exactly one of some number of interchangeable <strong>workers</strong>, which each run concurrently. The tasks should not be too small, so that the amount of work involved in choosing a task is not too great<a href="#fn4" class="footnoteRef" id="fnref4"><sup>4</sup></a>, but they should also not be too large, so that if any worker finishes a task earlier than the others, there will likely be another task ready for it to do.</p>
<p>In this context, <strong>a task is a type of critical section</strong>, because once a task has been picked up by any single worker, the other workers are supposed to leave it alone. But critical sections carry the connotation of being very small, so they can execute and get out of the way quickly. I suppose I’ve been fortunate enough to work in domains where I’ve had the luxury to split things up into independent tasks most of the time. (These domains are sometimes referred to as <a href="http://en.wikipedia.org/wiki/Embarrassingly_parallel"><em>embarrassingly parallel</em></a>.) But in the code below, we’ll also see one example of a state variable which is operated on by short critical sections instead of tasks.</p>
<p><a name="show-me-the-code"></a></p>
<h1 id="show-me-the-code-already">Show me the code already! <a href="#show-me-the-code">#</a></h1>
<figure class='code'><figcaption>
concurrency.asm<a href='https://github.com/davidad/asm_concurrency/blob/TJ-1/concurrency.asm#L1-8'>context</a>
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='1' class='line-number'></div><div data-line='2' class='line-number'></div><div data-line='3' class='line-number'></div><div data-line='4' class='line-number'></div><div data-line='5' class='line-number'></div><div data-line='6' class='line-number'></div><div data-line='7' class='line-number'></div><div data-line='8' class='line-number'></div></pre></td><td class='main  nasm'><pre><div class='line'><code><span class="cp">%include &quot;os_dependent_stuff.asm&quot;</span>
</code></div><div class='line'><code> </code></div><div class='line'><code>  <span class="c1">; Initialize constants.</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nv">r12</span><span class="p">,</span> <span class="mi">65537</span>                 <span class="c1">; Exponent to modular-exponentiate with</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rbx</span><span class="p">,</span> <span class="mi">235</span>                   <span class="c1">; Modulus to modular-exponentiate with</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nv">r15</span><span class="p">,</span> <span class="nv">NPROCS</span>                <span class="c1">; Number of worker processes to fork.</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nv">r14</span><span class="p">,</span> <span class="p">(</span><span class="nb">SI</span><span class="nv">ZE</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span><span class="o">&ast;</span><span class="mi">8</span>            <span class="c1">; Size of shared memory; reserving first</span>
</code></div><div class='line'><code>                                 <span class="c1">; 64 bits for bookkeeping</span>
</code></div></pre></td></tr></table></div></figure>


<p>The first two constants are pretty straightforward – just the parameters of the task to compute. <code>NPROCS</code> and <code>SIZE</code> need a bit of explanation. These are constants which are actually defined in the <code>Makefile</code> and passed in to <code>nasm</code> using the <code>-D</code> option (as in <code>-DNPROCS=7</code>)<a href="#fn5" class="footnoteRef" id="fnref5"><sup>5</sup></a>. <code>SIZE</code> is actually measured in 8-byte machine words; it’s the number of <strong>tasks</strong> we want to perform. (As I briefly mentioned <a href="#task-size">earlier</a>, each task in this program is an 8-byte machine word of the output file to be computed.)</p>
<figure class='code'><figcaption>
concurrency.asm<a href='https://github.com/davidad/asm_concurrency/blob/TJ-1/concurrency.asm#L10-12'>context</a>
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='10' class='line-number'></div><div data-line='11' class='line-number'></div><div data-line='12' class='line-number'></div></pre></td><td class='main  nasm'><pre><div class='line'><code>  <span class="c1">; Check for command-line argument.</span>
</code></div><div class='line'><code>  <span class="nf">cmp</span> <span class="kt">qword</span> <span class="p">[</span><span class="nb">rsp</span><span class="p">],</span> <span class="mi">1</span>
</code></div><div class='line'><code>  <span class="nf">je</span> <span class="nv">map_anon</span>
</code></div></pre></td></tr></table></div></figure>


<p>When our program is entered by the OS, the command-line is on the stack; <code>[rsp]</code> is the number of command-line tokens (including the name of the program itself), <code>[rsp+8]</code> will be a pointer to the name of the program, <code>[rsp+2*8]</code> a pointer to the first command-line argument (if there is one), and so on. If we don’t have any command-line arguments, then the number of tokens will be 1 (just the name of the program). In this case, we’re going to a request an anonymous region of memory; otherwise, we’re going to open the file specified on the command line and map that. <em>Note:</em> if you haven’t seen <code>mmap</code> before, check out <a href="http://man7.org/linux/man-pages/man2/mmap.2.html">its man page</a>. In my opinion, it’s the “right” way to do either memory allocation (“anonymous” mappings) or file I/O.</p>
<p><a name="open-ftruncate=mmap"></a></p>
<h2 id="open-ftruncate-and-mmap"><code>open()</code>, <code>ftruncate()</code>, and <code>mmap()</code> <a href="#open-ftruncate-mmap">#</a></h2>
<figure class='code'><figcaption>
concurrency.asm<a href='https://github.com/davidad/asm_concurrency/blob/TJ-1/concurrency.asm#L14-43'>context</a>
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='14' class='line-number'></div><div data-line='15' class='line-number'></div><div data-line='16' class='line-number'></div><div data-line='17' class='line-number'></div><div data-line='18' class='line-number'></div><div data-line='19' class='line-number'></div><div data-line='20' class='line-number'></div><div data-line='21' class='line-number'></div><div data-line='22' class='line-number'></div><div data-line='23' class='line-number'></div><div data-line='24' class='line-number'></div><div data-line='25' class='line-number'></div><div data-line='26' class='line-number'></div><div data-line='27' class='line-number'></div><div data-line='28' class='line-number'></div><div data-line='29' class='line-number'></div><div data-line='30' class='line-number'></div><div data-line='31' class='line-number'></div><div data-line='32' class='line-number'></div><div data-line='33' class='line-number'></div><div data-line='34' class='line-number'></div><div data-line='35' class='line-number'></div><div data-line='36' class='line-number'></div><div data-line='37' class='line-number'></div><div data-line='38' class='line-number'></div><div data-line='39' class='line-number'></div><div data-line='40' class='line-number'></div><div data-line='41' class='line-number'></div><div data-line='42' class='line-number'></div><div data-line='43' class='line-number'></div></pre></td><td class='main  nasm'><pre><div class='line'><code><span class="nl">open_file:</span>
</code></div><div class='line'><code>  <span class="c1">; We have a file specified on the command line, so open() it.</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rax</span><span class="p">,</span> <span class="nv">SYSCALL_OPEN</span>          <span class="c1">; set up open()</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rdi</span><span class="p">,</span> <span class="p">[</span><span class="nb">rsp</span><span class="o">+</span><span class="mi">2</span><span class="o">&ast;</span><span class="mi">8</span><span class="p">]</span>               <span class="c1">; filename from command line</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rsi</span><span class="p">,</span> <span class="nv">O_RDWR</span><span class="o">|</span><span class="nv">O_CREAT</span>          <span class="c1">; read/write mode; create if necessary</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rdx</span><span class="p">,</span> <span class="mi">660</span><span class="nv">o</span>                    <span class="c1">; &#x60;chmod&#x60;-mode of file to create (octal)</span>
</code></div><div class='line'><code>  <span class="nf">syscall</span>                        <span class="c1">; do open() system call</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nv">r13</span><span class="p">,</span> <span class="nb">rax</span>                   <span class="c1">; preserve file descriptor in r13</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rax</span><span class="p">,</span> <span class="nv">SYSCALL_FTRUNCATE</span>     <span class="c1">; set up ftruncate() to adjust file size</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rdi</span><span class="p">,</span> <span class="nv">r13</span>                     <span class="c1">; file descriptor</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rsi</span><span class="p">,</span> <span class="nv">r14</span>                     <span class="c1">; desired file size</span>
</code></div><div class='line'><code>  <span class="nf">syscall</span>                        <span class="c1">; do ftruncate() system call</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nv">r8</span><span class="p">,</span>  <span class="nv">r13</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nv">r10</span><span class="p">,</span> <span class="nv">MAP_SHARED</span>
</code></div><div class='line'><code>  <span class="nf">jmp</span> <span class="nv">mmap</span>
</code></div><div class='line'><code> </code></div><div class='line'><code>  <span class="c1">; Ask the kernel for a shared memory mapping.</span>
</code></div><div class='line'><code><span class="nl">map_anon:</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nv">r10</span><span class="p">,</span> <span class="nv">MAP_SHARED</span><span class="o">|</span><span class="nv">MAP_ANON</span>     <span class="c1">; MAP_ANON means not backed by a file</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nv">r8</span><span class="p">,</span>  <span class="o">-</span><span class="mi">1</span>                      <span class="c1">; thus our file descriptor is -1</span>
</code></div><div class='line'><code><span class="nl">mmap:</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nv">r9</span><span class="p">,</span>   <span class="mi">0</span>                      <span class="c1">; and there&#39;s no file offset in either case.</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rax</span><span class="p">,</span> <span class="nv">SYSCALL_MMAP</span>          <span class="c1">; set up mmap()</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rdx</span><span class="p">,</span> <span class="nv">PROT_READ</span><span class="o">|</span><span class="nv">PROT_WRITE</span>    <span class="c1">; We&#39;d like a read/write mapping</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rdi</span><span class="p">,</span>  <span class="mi">0</span>                      <span class="c1">; at no pre-specified memory location.</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rsi</span><span class="p">,</span> <span class="nv">r14</span>                     <span class="c1">; Length of the mapping in bytes.</span>
</code></div><div class='line'><code>  <span class="nf">syscall</span>                        <span class="c1">; do mmap() system call.</span>
</code></div><div class='line'><code>  <span class="nf">test</span> <span class="nb">rax</span><span class="p">,</span> <span class="nb">rax</span>                  <span class="c1">; Return value will be in rax.</span>
</code></div><div class='line'><code>  <span class="nf">js</span> <span class="nv">error</span>                       <span class="c1">; If it&#39;s negative, that&#39;s trouble.</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rbp</span><span class="p">,</span> <span class="nb">rax</span>                   <span class="c1">; Otherwise, we have our memory region [rbp].</span>
</code></div></pre></td></tr></table></div></figure>


<figure class='code'><figcaption>
concurrency.asm<a href='https://github.com/davidad/asm_concurrency/blob/TJ-1/concurrency.asm#L141-145'>context</a>
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='141' class='line-number'></div><div data-line='142' class='line-number'></div><div data-line='143' class='line-number'></div><div data-line='144' class='line-number'></div><div data-line='145' class='line-number'></div></pre></td><td class='main  nasm'><pre><div class='line'><code><span class="nl">error:</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rdi</span><span class="p">,</span> <span class="nb">rax</span>                   <span class="c1">; In case of error, return code is -errno...</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rax</span><span class="p">,</span> <span class="nv">SYSCALL_EXIT</span>
</code></div><div class='line'><code>  <span class="nf">neg</span> <span class="nb">rdi</span>                        <span class="c1">; ...so negate to get actual errno</span>
</code></div><div class='line'><code>  <span class="nf">syscall</span>
</code></div></pre></td></tr></table></div></figure>


<p>We actually have to make three system calls to get this set up: one to open the file (<code>SYSCALL_OPEN</code>), one to extend it to the appropriate size (<code>SYSCALL_FTRUNCATE</code>), and finally one to make the memory mapping (<code>SYSCALL_MMAP</code>).</p>
<p><a name="lock-add"></a></p>
<h2 id="lock-add"><code>lock add</code> <a href="#lock-add">#</a></h2>
<figure class='code'><figcaption>
concurrency.asm<a href='https://github.com/davidad/asm_concurrency/blob/TJ-1/concurrency.asm#L45-47'>context</a>
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='45' class='line-number'></div><div data-line='46' class='line-number'></div><div data-line='47' class='line-number'></div></pre></td><td class='main  nasm'><pre><div class='line'><code>  <span class="nf">lock</span> <span class="nv">add</span> <span class="p">[</span><span class="nb">rbp</span><span class="p">],</span> <span class="nv">r15</span>            <span class="c1">; Add NPROCS to the file&#39;s first machine word.</span>
</code></div><div class='line'><code>                                 <span class="c1">; We&#39;ll use it to track the # of still-running</span>
</code></div><div class='line'><code>                                 <span class="c1">; worker processes.</span>
</code></div></pre></td></tr></table></div></figure>


<p>Here’s our first concurrency primitive! We’re going to add <code>NPROCS</code> to the first machine word of this file. We’re counting on the fact that when the file is first created, all bytes will appear to be zero (a fact which <a href="http://unixhelp.ed.ac.uk/CGI/man-cgi?truncate+2">is actually true on most Unix implementations</a>). Why aren’t we just <em>setting</em> the word to zero? Well, a neat feature of this program is that we can run multiple copies of it on the same file, and they’ll share the work as if by magic. So, if we’re running as the second copy of the program, we don’t want to clobber this piece of bookkeeping state – we just want to contribute <code>NPROCS</code> workers to the worker pool.</p>
<p><a name="fork"></a></p>
<h2 id="fork"><code>fork()</code> <a href="#fork">#</a></h2>
<figure class='code'><figcaption>
concurrency.asm<a href='https://github.com/davidad/asm_concurrency/blob/TJ-1/concurrency.asm#L49-61'>context</a>
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='49' class='line-number'></div><div data-line='50' class='line-number'></div><div data-line='51' class='line-number'></div><div data-line='52' class='line-number'></div><div data-line='53' class='line-number'></div><div data-line='54' class='line-number'></div><div data-line='55' class='line-number'></div><div data-line='56' class='line-number'></div><div data-line='57' class='line-number'></div><div data-line='58' class='line-number'></div><div data-line='59' class='line-number'></div><div data-line='60' class='line-number'></div><div data-line='61' class='line-number'></div></pre></td><td class='main  nasm'><pre><div class='line'><code>  <span class="c1">; Next, fork NPROCS processes.</span>
</code></div><div class='line'><code><span class="nl">fork:</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">eax</span><span class="p">,</span> <span class="nv">SYSCALL_FORK</span>
</code></div><div class='line'><code>  <span class="nf">syscall</span>
</code></div><div class='line'><code><span class="cp">%ifidn __OUTPUT_FORMAT__,elf64     </span><span class="c1">; (This means we&#39;re running on Linux)</span>
</code></div><div class='line'><code>  <span class="nf">test</span> <span class="nb">rax</span><span class="p">,</span> <span class="nb">rax</span>                  <span class="c1">; We&#39;re a child iff return value of fork()==0.</span>
</code></div><div class='line'><code>  <span class="nf">jz</span> <span class="nb">ch</span><span class="nv">ild</span>
</code></div><div class='line'><code><span class="cp">%elifidn __OUTPUT_FORMAT__,macho64 </span><span class="c1">; (This means we&#39;re running on OSX)</span>
</code></div><div class='line'><code>  <span class="nf">test</span> <span class="nb">rdx</span><span class="p">,</span> <span class="nb">rdx</span>                  <span class="c1">; Apple...you&#39;re not supposed to touch rdx here</span>
</code></div><div class='line'><code>  <span class="nf">jnz</span> <span class="nb">ch</span><span class="nv">ild</span>                      <span class="c1">; Apple, what</span>
</code></div><div class='line'><code><span class="cp">%endif</span>
</code></div><div class='line'><code>  <span class="nf">dec</span> <span class="nv">r15</span>
</code></div><div class='line'><code>  <span class="nf">jnz</span> <span class="nv">fork</span>
</code></div></pre></td></tr></table></div></figure>


<p>Apple’s implementation of <code>fork()</code> is a little messed up, so unfortunately we’re forced to put some OS-dependent logic in here. But the basic idea is, we’re going to keep calling <code>fork()</code> only if (a) we’re the parent process, and not a newly <code>fork()</code>ed worked process, and (b) the number of processes we were supposed to spawn hasn’t decremented to zero yet.</p>
<p><a name="the-parent"></a></p>
<h2 id="the-parent-process">The parent process <a href="#the-parent">#</a></h2>
<figure class='code'><figcaption>
concurrency.asm<a href='https://github.com/davidad/asm_concurrency/blob/TJ-1/concurrency.asm#L63-66'>context</a>
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='63' class='line-number'></div><div data-line='64' class='line-number'></div><div data-line='65' class='line-number'></div><div data-line='66' class='line-number'></div></pre></td><td class='main  nasm'><pre><div class='line'><code><span class="nl">parent:</span>
</code></div><div class='line'><code>  <span class="nf">pause</span>
</code></div><div class='line'><code>  <span class="nf">cmp</span> <span class="kt">qword</span> <span class="p">[</span><span class="nb">rbp</span><span class="p">],</span> <span class="mi">0</span>
</code></div><div class='line'><code>  <span class="nf">jnz</span> <span class="nv">parent</span>                     <span class="c1">; Wait for [rbp], the worker count, to be zero</span>
</code></div></pre></td></tr></table></div></figure>


<p>Now, the parent process simply waits until there aren’t any active workers/child processes (they’ll gracefully disappear once there’s no more work for them to do). The <code>pause</code> instruction is a hint to the system that it shouldn’t actually spend a lot of energy spinning in this loop.</p>
<figure class='code'><figcaption>
concurrency.asm<a href='https://github.com/davidad/asm_concurrency/blob/TJ-1/concurrency.asm#L84-87'>context</a>
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='84' class='line-number'></div><div data-line='85' class='line-number'></div><div data-line='86' class='line-number'></div><div data-line='87' class='line-number'></div></pre></td><td class='main  nasm'><pre><div class='line'><code><span class="nl">exit_success:</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">eax</span><span class="p">,</span> <span class="nv">SYSCALL_EXIT</span>          <span class="c1">; Normal exit</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">edi</span><span class="p">,</span> <span class="mi">0</span>
</code></div><div class='line'><code>  <span class="nf">syscall</span>
</code></div></pre></td></tr></table></div></figure>


<p>Once the number of active workers is zero, the parent bails out, returning the sucess code, <code>0</code>.</p>
<p><a name="the-worker"></a></p>
<h2 id="dividing-up-work">Dividing up work <a href="#the-worker">#</a></h2>
<p>Here’s where our workers divide up their tasks – the most important concurrency-related operation in the program:</p>
<figure class='code'><figcaption>
concurrency.asm<a href='https://github.com/davidad/asm_concurrency/blob/TJ-1/concurrency.asm#L89-104'>context</a>
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='89' class='line-number'></div><div data-line='90' class='line-number'></div><div data-line='91' class='line-number'></div><div data-line='92' class='line-number'></div><div data-line='93' class='line-number'></div><div data-line='94' class='line-number'></div><div data-line='95' class='line-number'></div><div data-line='96' class='line-number'></div><div data-line='97' class='line-number'></div><div data-line='98' class='line-number'></div><div data-line='99' class='line-number'></div><div data-line='100' class='line-number'></div><div data-line='101' class='line-number'></div><div data-line='102' class='line-number'></div><div data-line='103' class='line-number'></div><div data-line='104' class='line-number'></div></pre></td><td class='main  nasm'><pre><div class='line'><code><span class="nl">child:</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rsi</span><span class="p">,</span> <span class="nv">r14</span>                   <span class="c1">; Restore rsi from r14 (saved earlier)</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">cl</span><span class="p">,</span> <span class="mh">0xff</span>                   <span class="c1">; Set rcx to be nonzero</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rdi</span><span class="p">,</span> <span class="mi">8</span>                     <span class="c1">; Start from index 8 (past the bookkeeping)</span>
</code></div><div class='line'><code><span class="nl">find_work:</span>                       <span class="c1">; and try to find a piece of work to claim</span>
</code></div><div class='line'><code>  <span class="nf">xor</span> <span class="nb">rax</span><span class="p">,</span> <span class="nb">rax</span>
</code></div><div class='line'><code>  <span class="nf">cmp</span> <span class="kt">qword</span> <span class="p">[</span><span class="nb">rbp</span><span class="o">+</span><span class="nb">rdi</span><span class="p">],</span> <span class="mi">0</span>         <span class="c1">; Check if qword [rbp+rdi] is unclaimed.</span>
</code></div><div class='line'><code>  <span class="nf">jnz</span> <span class="nv">.moveon</span>                    <span class="c1">; If not, move on - no use trying to lock.</span>
</code></div><div class='line'><code>  <span class="nf">lock</span> <span class="nv">cmpxchg</span> <span class="p">[</span><span class="nb">rbp</span><span class="o">+</span><span class="nb">rdi</span><span class="p">],</span> <span class="nb">rcx</span>    <span class="c1">; Try to &quot;claim&quot; qword [rbp+rdi] if it is still</span>
</code></div><div class='line'><code>                                 <span class="c1">; unclaimed.</span>
</code></div><div class='line'><code>  <span class="nf">jz</span> <span class="nv">found_work</span>                  <span class="c1">; If successful, zero flag is set</span>
</code></div><div class='line'><code><span class="nl">.moveon:</span>
</code></div><div class='line'><code>  <span class="nf">add</span> <span class="nb">rdi</span><span class="p">,</span> <span class="mi">8</span>                     <span class="c1">; Otherwise, try a different piece.</span>
</code></div><div class='line'><code><span class="nl">find_work.next:</span>
</code></div><div class='line'><code>  <span class="nf">cmp</span> <span class="nb">rdi</span><span class="p">,</span> <span class="nb">rsi</span>                   <span class="c1">; Make sure we haven&#39;t hit the end.</span>
</code></div><div class='line'><code>  <span class="nf">jne</span> <span class="nv">find_work</span>
</code></div></pre></td></tr></table></div></figure>


<p>The worker is linearly scanning each 8-byte word, starting with the second one in the <code>[rbp]</code> region (since the word right at <code>[rbp]</code> just represents how many workers there are), looking for one that is zero. As we covered earlier, the file is going to start out being all zeroes. The way I imagine this setup in my head, the file starts out as a barren desert of zeroes, like the old American West, and the workers are searching for a plot of land to homestead on. The first empty plot of land they find, they put up a sign that says “RESERVED” and they start building their homestead (that’s the task). In this case, the RESERVED sign is <code>0xff</code>. Now, other workers will keep on movin’ until they find their own plot of land. The key here is to prevent two workers from putting up a RESERVED sign at the same location. That’s where <strong><code>lock cmpxchg</code></strong> comes in.</p>
<p><a name="lock-cmpxchg"></a></p>
<h3 id="lock-cmpxchg-compare-and-swap"><code>lock cmpxchg</code> (compare-and-swap) <a href="#lock-cmpxchg">#</a></h3>
<p>This is a slightly complex but beautiful operation. It takes three parameters:</p>
<ul>
<li>a <em>memory location</em> (<code>[rbp+rdi]</code> in this case), which has operand size dependent on the operand size of the next parameter (in this case, it’s an 8-byte machine word, because the next parameter is an 8-byte register)<a href="#fn6" class="footnoteRef" id="fnref6"><sup>6</sup></a>,</li>
<li>an <em>update value</em> to store (<code>rcx</code> in this case, holding the value <code>0xff</code>, our sentinel for RESERVED), and</li>
<li>an <em>expected value</em> to compare against (always <code>rax</code>, an implicit parameter, and in this case zeroed out by <code>xor rax, rax</code>; zero is the value of unreserved words because it is the value freshly allocated files are filled with).</li>
</ul>
<p>The first thing <code>lock cmpxchg</code> will do is lock the memory location and compare it to the expected value. Then, depending on the result, one of two things will happen:</p>
<ul>
<li>If the comparison fails, that means the state of memory isn’t what we expected—it must have changed since we last looked. This is bad news; the update is aborted. To inform us exactly what went wrong, <code>cmpxchg</code> will overwrite <code>rax</code> with whatever is actually in memory <em>now</em> (instead of what we expected). The zero-flag <code>ZF</code> will be cleared to signal non-equality, and the memory location will be unlocked.</li>
<li>If the value in memory <em>does</em> match what we expected, then our update value replaces it in that memory location before any other CPU/core has a chance to either read or write there. That’s a “successful” compare-and-swap. The zero-flag <code>ZF</code> will be set to signal success, and the memory location will be unlocked as soon as it is updated.</li>
</ul>
<p>The upshot in our application is that it’s impossible for more than one worker to reserve the same task, because reservation always happens in an <em>atomic</em> (<code>lock</code>ed) operation, which:</p>
<ul>
<li>will be aborted if another reservation happened before it, and</li>
<li>will prevent any other atomic operation from starting until this one is done.</li>
</ul>
<p><a name="tatas"></a></p>
<h3 id="test-and-test-and-set">“test-and-test-and-set” <a href="#tatas">#</a></h3>
<p>You may notice that we do an ordinary <code>cmp</code> in advance of the <code>lock cmpxchg</code>. That’s not strictly necessary, but it speeds up this part of the program quite bit; if a location was already claimed some time ago, we may as well notice that before putting a <code>lock</code> on it (which is a moderately expensive operation) and simply move on until we find something that <em>looks</em> empty (then <code>lock cmpxchg</code> to be <em>sure</em> it’s empty).</p>
<p><a name="doing-the-task"></a></p>
<h2 id="doing-the-task">Doing the task <a href="#doing-the-task">#</a></h2>
<figure class='code'><figcaption>
concurrency.asm<a href='https://github.com/davidad/asm_concurrency/blob/TJ-1/concurrency.asm#L110-139'>context</a>
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='110' class='line-number'></div><div data-line='111' class='line-number'></div><div data-line='112' class='line-number'></div><div data-line='113' class='line-number'></div><div data-line='114' class='line-number'></div><div data-line='115' class='line-number'></div><div data-line='116' class='line-number'></div><div data-line='117' class='line-number'></div><div data-line='118' class='line-number'></div><div data-line='119' class='line-number'></div><div data-line='120' class='line-number'></div><div data-line='121' class='line-number'></div><div data-line='122' class='line-number'></div><div data-line='123' class='line-number'></div><div data-line='124' class='line-number'></div><div data-line='125' class='line-number'></div><div data-line='126' class='line-number'></div><div data-line='127' class='line-number'></div><div data-line='128' class='line-number'></div><div data-line='129' class='line-number'></div><div data-line='130' class='line-number'></div><div data-line='131' class='line-number'></div><div data-line='132' class='line-number'></div><div data-line='133' class='line-number'></div><div data-line='134' class='line-number'></div><div data-line='135' class='line-number'></div><div data-line='136' class='line-number'></div><div data-line='137' class='line-number'></div><div data-line='138' class='line-number'></div><div data-line='139' class='line-number'></div></pre></td><td class='main  nasm'><pre><div class='line'><code><span class="nl">found_work:</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nv">r8</span><span class="p">,</span> <span class="mi">8</span>                      <span class="c1">; There are 8 pieces per task.</span>
</code></div><div class='line'><code><span class="nl">do_piece:</span>                       <span class="c1">; This part does the actual work of mod-exp.</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nv">r13</span><span class="p">,</span> <span class="nv">r12</span>                   <span class="c1">; Copy exponent to r13.</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rax</span><span class="p">,</span> <span class="nb">rdi</span>                   <span class="c1">; The actual value to mod-exp should start</span>
</code></div><div class='line'><code>  <span class="nf">sub</span> <span class="nb">eax</span><span class="p">,</span> <span class="mh">0x7</span>                   <span class="c1">; at 1 for the first byte after the bookkeeping</span>
</code></div><div class='line'><code>  <span class="nf">xor</span> <span class="nb">rdx</span><span class="p">,</span> <span class="nb">rdx</span>                   <span class="c1">; word. This value is now in rax.</span>
</code></div><div class='line'><code>  <span class="nf">div</span> <span class="nb">rbx</span>                        <span class="c1">; Do modulo with modulus.</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nv">r11</span><span class="p">,</span> <span class="nb">rdx</span>                   <span class="c1">; Save remainder -- &quot;modded&quot; base -- to r11.</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rax</span><span class="p">,</span> <span class="mi">1</span>                     <span class="c1">; Initialize &quot;result&quot; to 1.</span>
</code></div><div class='line'><code><span class="nl">.modexploop:</span>
</code></div><div class='line'><code>  <span class="nf">test</span> <span class="nv">r13</span><span class="p">,</span> <span class="mi">1</span>                    <span class="c1">; Check low bit of exponent</span>
</code></div><div class='line'><code>  <span class="nf">jz</span> <span class="nv">.shift</span>
</code></div><div class='line'><code>  <span class="nf">mul</span> <span class="nv">r11</span>                        <span class="c1">; If set, multiply result by base</span>
</code></div><div class='line'><code>  <span class="nf">div</span> <span class="nb">rbx</span>                        <span class="c1">; Modulo by modulus</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rax</span><span class="p">,</span> <span class="nb">rdx</span>                   <span class="c1">; result &lt;- remainder</span>
</code></div><div class='line'><code><span class="nl">.shift:</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nv">r14</span><span class="p">,</span> <span class="nb">rax</span>                   <span class="c1">; Save result to r14</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rax</span><span class="p">,</span> <span class="nv">r11</span>                   <span class="c1">; and work with the base instead.</span>
</code></div><div class='line'><code>  <span class="nf">mul</span> <span class="nb">rax</span>                        <span class="c1">; Square the base.</span>
</code></div><div class='line'><code>  <span class="nf">div</span> <span class="nb">rbx</span>                        <span class="c1">; Modulo by modulus</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nv">r11</span><span class="p">,</span> <span class="nb">rdx</span>                   <span class="c1">; base &lt;- remainder</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rax</span><span class="p">,</span> <span class="nv">r14</span>                   <span class="c1">; Restore result from r14</span>
</code></div><div class='line'><code>  <span class="nf">shr</span> <span class="nv">r13</span><span class="p">,</span> <span class="mi">1</span>                     <span class="c1">; Shift exponent right by one bit</span>
</code></div><div class='line'><code>  <span class="nf">jnz</span> <span class="nv">.modexploop</span>                <span class="c1">; If the exponent isn&#39;t zero, keep working</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="kt">byte</span> <span class="p">[</span><span class="nb">rbp</span><span class="o">+</span><span class="nb">rdi</span><span class="p">],</span> <span class="nb">al</span>         <span class="c1">; Else, store result byte.</span>
</code></div><div class='line'><code>  <span class="nf">inc</span> <span class="nb">rdi</span>                        <span class="c1">; Move forward</span>
</code></div><div class='line'><code>  <span class="nf">dec</span> <span class="nv">r8</span>                         <span class="c1">; Decrement piece counter</span>
</code></div><div class='line'><code>  <span class="nf">jnz</span> <span class="nv">do_piece</span>                   <span class="c1">; Do the next piece if there is one.</span>
</code></div><div class='line'><code>  <span class="nf">jmp</span> <span class="nv">find_work.next</span>             <span class="c1">; Else, find the next task.</span>
</code></div></pre></td></tr></table></div></figure>


<p>This article is long enough without a detailed prose explanation of <a href="http://en.wikipedia.org/wiki/Modular_exponentiation#Right-to-left_binary_method">binary exponentiation</a> (which isn’t what it’s about, anyway). Suffice it to say that given an offset into the <code>[rbp]</code> region <code>rdi</code>, this chunk of code will replace each byte from <code>[rbp+rdi]</code> to <code>[rbp+rdi+7]</code> with the appropriate mod-exps of the values <code>rdi</code> through <code>rdi+7</code>. The code is somewhat deliberately inefficient (lots of <code>div</code>s, which consume dozens of clock cycles each) for realism’s sake—we want tasks to take a nontrivial length of time.</p>
<p><a name="being-done"></a></p>
<h2 id="being-done">Being done <a href="#being-done">#</a></h2>
<p><em>Note:</em> the block of code below is out-of-order and overlaps both of the previous two.</p>
<figure class='code'><figcaption>
concurrency.asm<a href='https://github.com/davidad/asm_concurrency/blob/TJ-1/concurrency.asm#L102-110'>context</a>
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='102' class='line-number'></div><div data-line='103' class='line-number'></div><div data-line='104' class='line-number'></div><div data-line='105' class='line-number'></div><div data-line='106' class='line-number'></div><div data-line='107' class='line-number'></div><div data-line='108' class='line-number'></div><div data-line='109' class='line-number'></div><div data-line='110' class='line-number'></div></pre></td><td class='main  nasm'><pre><div class='line'><code><span class="nl">find_work.next:</span>
</code></div><div class='line'><code>  <span class="nf">cmp</span> <span class="nb">rdi</span><span class="p">,</span> <span class="nb">rsi</span>                   <span class="c1">; Make sure we haven&#39;t hit the end.</span>
</code></div><div class='line'><code>  <span class="nf">jne</span> <span class="nv">find_work</span>
</code></div><div class='line'><code> </code></div><div class='line'><code><span class="nl">child_exit:</span>                      <span class="c1">; If we have hit the end, we&#39;re done.</span>
</code></div><div class='line'><code>  <span class="nf">lock</span> <span class="nv">dec</span> <span class="kt">qword</span> <span class="p">[</span><span class="nb">rbp</span><span class="p">]</span>           <span class="c1">; Atomic-decrement the # of active processes.</span>
</code></div><div class='line'><code>  <span class="nf">jmp</span> <span class="nv">exit_success</span>
</code></div><div class='line'><code> </code></div><div class='line'><code><span class="nl">found_work:</span>
</code></div></pre></td></tr></table></div></figure>

<p>Once there are no more unclaimed tasks to claim, we’re going to successfully terminate the worker. But first, we need to decrement the number of active workers.</p>
<p><a name="lock dec"></a></p>
<h3 id="lock-dec"><code>lock dec</code> <a href="#lock-dec">#</a></h3>
<p>By now you can probably guess that <code>lock dec</code> is a version of the <code>dec</code> (decrement) operation which will ensure that no other worker can decrement the number-of-active-workers variable at the same time (e.g. the last two workers reading the value <code>2</code>, decrementing it, and both writing back <code>1</code> and exiting, with nobody left to decrease it to <code>0</code>).</p>
<p><a name="badness"></a></p>
<h2 id="what-should-have-been-done-differentlyif-this-werent-just-an-example">What should have been done differently<br/>(if this weren’t just an example) <a href="#badness">#</a></h2>
<p>It’s worth pointing out that for this particular problem, I did a lot of things here that don’t actually make so much sense.</p>
<ul>
<li>There’s no particular reason to allow multiple simultaneous invocations of the whole program on the same file. If that requirement is relaxed, then it makes a lot more sense to divide up the work by starting each worker at a different offset and having them all skip <span class="math">\(n\)</span> tasks ahead when they finish (e.g. with <span class="math">\(n=7\)</span> workers, the seventh worker would take the <span class="math">\((7k+6)\)</span>th task for every integer <span class="math">\(k\)</span>).</li>
<li>Even with the requirement in question, there would be more efficient ways to divide up tasks—for instance, instead of trying to claim every task in order, workers could maintain a second bookkeeping word which would track the address of the current next-unclaimed-task.</li>
<li>Tasks should have been rather larger than single 8-byte machine words; the coordination overhead for tasks at this fine granularity is unlikely to pay off.</li>
<li>The modular exponentiation could have been implemented more efficiently.</li>
<li>In fact, since the result is just a single 235-byte pattern that repeats over and over, I could have just computed it once and repeatedly written it into the file. (Since this would be a primarily storage-bound operation, there wouldn’t even be much sense in parallelizing it.)</li>
</ul>
<p>But hey, now we know how to write concurrent x64 programs using memory-mapped files.</p>
<p><a name="conclusion"></a></p>
<h2 id="conclusion">Conclusion <a href="#conclusion">#</a></h2>
<p>In whichever abstraction we’re working, if we’re doing concurrent processing on an Intel platform, it may be worth considering how the abstraction resolves down to concepts like these. See if your platform exposes a thing like <code>mmap()</code>, for instance, and consider how your “concurrency primitives” might translate into individual <code>lock</code>ed operations. This will assist in reasoning about performance issues, as well as providing a deeper understanding of your concurrency primitives’ guarantees.</p>
<p>And, of course, make sure that you’ve given assembly a big check-mark under “Has Concurrency Primitives?” on your personal programming-environment scorecard.</p>
<section class="footnotes">
<hr />
<ol>
<li id="fn1"><p>This may be counterintuitive if you think of assembly as a conspicuously old-school way to program. I won’t deny that it is, but <code>nasm</code>, DynASM, <code>r2</code>, and the other tools I use for assembly hacking are relentlessly kept in sync with Intel’s assembly-language specification, which is updated in advance of every new CPU release. Other tools take much longer to adapt because, well, Intel doesn’t <em>specify</em> exactly how they should make use of new features. So, in fact, the latest hardware is supported in assembly before it’s supported anywhere else.<a href="#fnref1">↩</a></p></li>
<li id="fn2"><p>If I had been doing serious work, instead of using a flat binary file, I would be using <a href="http://kentonv.github.io/capnproto/">Cap’n Proto</a>, so the bookkeeping field(s) would be well-delineated. Perhaps in a future article, I’ll show how to do that from assembly. Then, instead of <code>hexdump</code>, I’d be using <code>capnp</code> to explore the data. But <code>hexdump</code> is a quite versatile tool and nice to know anyway.<a href="#fnref2">↩</a></p></li>
<li id="fn3"><p>Pun not intended.<a href="#fnref3">↩</a></p></li>
<li id="fn4"><p>The code displayed here violates this rule pretty badly, which is probably why the speedup from running in parallel is noticeably worse than ideal, but I think to do better would overcomplicate the presentation.<a href="#fnref4">↩</a></p></li>
<li id="fn5"><p>I could (should?) have used command-line arguments for these values, but let’s face it, parsing command-line arguments is annoying in <em>any</em> language, let alone assembly.<a href="#fnref5">↩</a></p></li>
<li id="fn6"><p>There are some applicatons for which you might wish to compare-and-swap <em>two</em> machine words (often, two pointers) in a single atomic operation. This can be done using the <code>lock cmpxchg16b</code> instruction (note: the 16 bytes still have to be contiguous in memory, and in fact must be 16-byte-aligned).<a href="#fnref6">↩</a></p></li>
</ol>
</section>]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[The Security/Product Design Correspondence]]></title>
    <link href="http://davidad.github.io/blog/2014/03/16/infosec-the-product-design-correspondence/"/>
    <updated>2014-03-16T09:55:30-04:00</updated>
    <id>http://davidad.github.io/blog/2014/03/16/infosec-the-product-design-correspondence</id>
    <content type="html"><![CDATA[<p><strong>General disclaimer for InfoSec articles:</strong> <em>Reading this article does not qualify you to design secure systems. Writing this article does not qualify </em>me <em>to design secure systems. In fact, </em>nobody is qualified to design secure systems<em>. A system should not be considered secure unless it has been reviewed by multiple security experts </em>and <em>resisted multiple serious attempts to violate its security claims in practice. The information contained in this article is offered “as is” and without warranties of any kind (express, implied, and statutory), all of which the author expressly disclaims to the fullest extent permitted by law.</em></p>
<hr />
<blockquote>
<p>If programming is the art of adding functionality to computers, security is the art of removing it.</p>
</blockquote>
<p>This maxim is a bit unfair to deep and wonderful world of information security (InfoSec), but it has a point. A lot of essential concepts in InfoSec have natural opposites in software product design.</p>
<p>Let’s start at the top. Every professional software project begins with specifications. In product design, the specifications are called <strong>use cases</strong>: stories about an external agent who wants to perform some function, and how they would go about performing the function using your software. In InfoSec, the specifications are called <strong>threats</strong>. These are also stories about an external agent who wants to perform some function, and how would go about performing the function using your software. The difference is, in product design, you want to make the agent’s job <em>as easy as possible</em>, while in InfoSec, you want to make it as <em>hard</em> as possible. We also have these related correspondences: <!-- more --></p>
<ul>
<li><strong>Use case model</strong> ⇔ <strong>Threat model</strong></li>
<li><strong>User</strong> ⇔ <strong>Attacker</strong></li>
<li><strong>User interface</strong> ⇔ <strong>Attack surface</strong></li>
<li><strong>Interaction</strong> ⇔ <strong>Protocol</strong></li>
<li><strong>Affordance</strong> ⇔ <strong>Vulnerability</strong></li>
</ul>
<p>In product design, the goal is to address all use cases with a set of <strong>features</strong>. The correspondence between a use case model and a feature set is nontrivial, and translating use cases into features is arguably the core of the product designer’s job. Meanwhile in InfoSec, the next step is to address all threats with a set of <strong>claims</strong>; the correspondence between a threat model and a set of security claims is nontrivial in the same sense. Both involve many assumptions about what the user/attacker is willing and able to do, and guesses about the best way to enable/prevent them from achieving their objectives, drawing on a lot of experience and patterns observed in the field with both well-designed and badly-designed products/security systems.</p>
<p>The most common features and most common security claims are also related:</p>
<ul>
<li>A <strong>view/display/read</strong> feature, enabling a user to access a record of information, is the opposite of a <strong>confidentiality</strong> claim, guaranteeing that an attacker cannot access information.</li>
<li>A <strong>modify/update</strong> feature, enabling a user to edit a record of information, is the opposite of an <strong>integrity</strong> claim, guaranteeing that an attacker cannot modify information without detection.</li>
<li>A <strong>create</strong> feautre, enabling a user to add a new record, is the opposite of an <strong>authenticity</strong> claim, guaranteeing that an attacker cannot create a new record.</li>
<li>A <strong>delete/remove</strong> feature, enabling a user to destroy a record, is the opposite of a <strong>non-repudiation</strong> claim, guaranteeing that an attacker cannot credibly deny the existence of information once it is entered into the system.</li>
</ul>
<p>This correspondence is essentially perfect for confidentiality and integrity; autheticity and non-repudiation are a little more subtle. Just as any system which supports both creation and deletion technically supports modification (since a user can delete a record and then add back a modified version), any system which provides authenticity and non-repudiation also provides integrity.</p>
<p>One place where InfoSec and product design overlap is <strong>availability</strong>. The product design version of availability is that a user wishes to access our system through some communications channel, and it must be able to respond. The InfoSec version is that an attacker wishes to cause our system to stop responding to legitimate users (usually, though not always, via <a href="http://en.wikipedia.org/wiki/Denial-of-service_attack">denial of service</a> techniques), and the attacker must be unable to do this.</p>
<p>Availability is commonly listed beside confidentiality and integrity as one of the “three core goals” of information security, but it is really a different kind of thing. It’s sometimes possible to get all four of the other security claims listed above simply by careful application of off-the-shelf cryptographic primitives, but there are no such cryptographic solutions for availability. The closest thing to a magic availability solution is massive scale, with redundant nodes all over the planet ready to take up the slack if other nodes stop responding. (BitTorrent and Bitcoin both fall into this category.) However, truly high availability requires a dedicated 24x7 staff equipped to respond to emerging threats. It is probably best to let <a href="http://www.cloudflare.com/">someone else</a> handle that.</p>
<p>You may also come across the words <strong>authorization</strong> and <strong>authentication</strong> connected with some of the above. These are issues without clear product design correspondences (except insofar as products are designed to provide them in their InfoSec senses). Like <strong>trust</strong> and <strong>risk</strong>, they also tend to be intricately tied up in human affairs. These terms, along with the basic categories of cryptographic primitives, will be treated in future InfoSec articles.</p>]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Systems Past: the only 8 software innovations we actually use]]></title>
    <link href="http://davidad.github.io/blog/2014/03/12/the-operating-system-is-out-of-date/"/>
    <updated>2014-03-12T19:21:49-04:00</updated>
    <id>http://davidad.github.io/blog/2014/03/12/the-operating-system-is-out-of-date</id>
    <content type="html"><![CDATA[<p><em>Note: This is a position piece, not a technical article. Hat tip to <a href="https://twitter.com/_JacobJacob">Jake Skelcy</a> for requesting such a piece.</em></p>
<p>Computers didn’t always have operating systems. The earliest machines, like the <a href="http://en.wikipedia.org/wiki/Harvard_Mark_I">Harvard Mark I</a> and the <a href="http://en.wikipedia.org/wiki/EDVAC">EDVAC</a>, performed one “computation” at a time. Whenever a computation finished, with its output printed by a teletypewriter or recorded on a magnetic tape, the machine would shut down. A person would then have to notice the machine stopped, unload the output, set up a new computation by manually loading the input and program instructions, and finally, press the <strong>start button</strong> to get the machine cranking again. On the Harvard Mark I, for instance, restarting would involve separately turning on multiple electric motors and then pressing a button marked MAIN SEQUENCE.</p>
<p><a href="http://commons.wikimedia.org/wiki/File:Harvard_Mark_I_Computer_-_Input-Output_Details.jpg"><img src="http://upload.wikimedia.org/wikipedia/commons/0/07/Harvard_Mark_I_Computer_-_Input-Output_Details.jpg" alt="The control panel of the Harvard Mark I." /></a></p>
<p><strong>This is the context in which the programming language (PL) and the operating system (OS) were invented. The year was 1955. Almost everything since then has been window dressing</strong> (so to speak). In this essay, I’m going to tell you my perspective on the PL and the OS, and the six other things since then which I consider significant improvements, which have made it into software practice, and which are neither algorithms nor data structures (but rather system concepts). Despite those and other incremental changes, to this day<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a>, we work exclusively<a href="#fn2" class="footnoteRef" id="fnref2"><sup>2</sup></a> within software environments which can definitely be considered programming languages and operating systems, in exactly the same sense as those phrases were used almost 60 years ago. My position is:</p>
<ul>
<li>Frankly, this is backward, and we ought to admit it.</li>
<li>Most of this stuff was invented by people who had a lot less knowledge and experience with computing than we have accumulated today. <strong>All</strong> of it was <strong>invented by people</strong>: mortal, fallible humans like you and me who were just trying to make something work. With a solid historical perspective we can dare to do better. <!-- more --></li>
</ul>
<p><a name="The-Programming-Language"></a></p>
<h2 id="the-programming-language">1. The Programming Language <a href="#The-Programming-Language">#</a></h2>
<p><strong>Year</strong>: 1955</p>
<h3 id="archetype">Archetype</h3>
<p>Every programming language used today is descended from <a href="http://en.wikipedia.org/wiki/Fortran#History">FORTRAN</a><a href="#fn3" class="footnoteRef" id="fnref3"><sup>3</sup></a>. FORTRAN is an abbreviation of FORmula TRANslator, and its mission was to translate typewritten algebraic formulae into executable code.</p>
<h3 id="motivation">Motivation</h3>
<p>Most uses of computers involved numerical calculations, which would be translated from equation form into machine code by hand (naturally, a time-consuming process). Multiple people (including <a href="http://en.wikipedia.org/wiki/Grace_Hopper">Grace Hopper</a>, <a href="http://en.wikipedia.org/wiki/John_Backus">John Backus</a>, and <a href="http://en.wikipedia.org/wiki/Alick_Glennie">Alick Glennie</a>) realized that the computer could be used to automate such translations, and the result was the programming language.</p>
<h3 id="concept">Concept</h3>
<p><strong>A programming language is a piece of software that automatically translates a specially formatted block of linear text into executable code.</strong></p>
<p>It is bizarre that we’re still expressing programs entirely with text 59 years later when the first interactive graphical display appeared <em>4</em> years later<a href="#fn4" class="footnoteRef" id="fnref4"><sup>4</sup></a>.</p>
<h3 id="benefits">Benefits</h3>
<p>The existence of programming languages enabled the use of concise notation for complex ideas, also known as <strong>abstraction</strong>. This not only saves time, but also makes programs easier to understand and maintain.</p>
<h3 id="exemplars">Exemplars</h3>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Lisp_(programming_language)">Lisp</a></li>
<li><a href="http://en.wikipedia.org/wiki/Forth_(programming_language)">Forth</a></li>
<li><a href="http://en.wikipedia.org/wiki/C_(programming_language)">C</a></li>
</ul>
<h3 id="drawbacks">Drawbacks</h3>
<ul>
<li>FORTRAN’s conflation of functions (an algebraic concept) and subroutines (a programming construct) persists to this day in nearly every piece of software, and causes no end of problems. <a href="http://en.wikipedia.org/wiki/Tracing_just-in-time_compilation">Tracing compilers</a> scratch the surface of reversing this mistake, but so far I know of no programming languages that are specifically designed around such a mechanism.</li>
<li>The fact that inputs had to be loaded into computers as stacks of punched cards limited the possible means of expressing computations – lines of text.</li>
</ul>
<p><a name="The-Operating-System"></a></p>
<h2 id="the-operating-system">2. The Operating System <a href="#The-Operating-System">#</a></h2>
<p><strong>Year</strong>: 1955</p>
<h3 id="archetype-1">Archetype</h3>
<p>The <a href="http://www.rand.org/content/dam/rand/pubs/papers/2008/P7316.pdf">General Motors/North American Aviation Monitor</a> was arguably the “original” OS.</p>
<h3 id="motivation-1">Motivation</h3>
<blockquote>
<p>The typical mode of operation was programmer present and at the operating console. When a programmer got ready for a test, he or she signed up on a first-in, first-out list, much like the list at a crowded restaurant. The programmer then checked progress frequently to estimate when he would reach the top. When his time got close, he stood by with card deck in hand. When the previous person finished or ran out of allotted time or abruptly crashed, the next programmer rushed in, checked the proper board was installed in the card reader, checked that the proper board was installed in the printer, checked that the proper board was installed on the punch, hung a magnetic tape, punched in on a mechanical time clock, addressed the console, set the appropriate switches, loaded his punched card deck in the card reader, prayed the first card would not jam, and pressed the LOAD button to invoke the bootstrap sequence.</p>
<p>If all went well, you could load a typical deck of about 300 cards and begin the execution of your first instruction about 5 minutes after entering the room. If only one person did all this set up and got going in 5 minutes, he bustled around the machine like a whirling dervish [sic]. Not always did things go so smoothly. If a programmer was fumble-fingered, cards jammed, magnetic tapes would not read due to defective splices, printer boards or switches were incorrectly set up, and it took 10 minutes to get going; or worse – you lost your opportunity and the next person took the machine when your time ran out. Usually the machine spent more time idle than computing. We programmers weren’t paid very much and although the machine was fairly costly, its capacity was even a more precious commodity since there were only 17 in the whole world.</p>
</blockquote>
<p>(<a href="http://www.rand.org/content/dam/rand/pubs/papers/2008/P7316.pdf">source</a>)</p>
<h3 id="concept-1">Concept</h3>
<p><strong>An operating system is a piece of software that facilitates the execution of multiple independent programs on one computer, using standard input and output routines.</strong></p>
<p>There’s a deep connection between the OS concept and the PL concept: the OS facilitates the execution of independent programs, while the PL facilitates the execution of independent modules or subroutines. In fact, GM/NAA OS was <a href="http://millosh.wordpress.com/2007/09/07/the-worlds-first-computer-operating-system-implemented-at-general-motors-research-labs-in-warren-michigan-in-1955/">literally</a> a modification of the octal code of the FORTRAN compiler tape.</p>
<p>The bizzareness about operating systems is that we still accept unquestioningly that it’s a good idea to run multiple programs on a single computer with the conceit that they’re totally independent. Well-specified interfaces are great <em>semantically</em> for maintainability. But when it comes to what the machine is <em>actually doing</em>, why not just run one ordinary program and teach it new functions over time? Why persist for 50 years the fiction that every distinct function performed by a computer executes independently in its own little barren environment?</p>
<h3 id="benefits-1">Benefits</h3>
<ul>
<li>Multiple programs could be run in a “batch,” thus keeping the machine from ever being idle (except in case of hardware failure or an empty job queue).</li>
<li>Programmers could now use standard input and output routines. (Depending on the formatting requirements and particular peripherals in use, properly handling input and output could previously have consumed most of the programming effort for simple jobs.)</li>
<li>Bare-hands reconfiguration of hardware (e.g. plugboards) finally disappeared from the work of programming.</li>
</ul>
<h3 id="exemplars8">Exemplars<a href="#fn5" class="footnoteRef" id="fnref5"><sup>5</sup></a></h3>
<ul>
<li><a href="http://en.wikipedia.org/wiki/CP/M">CP/M</a></li>
<li><a href="http://en.wikipedia.org/wiki/ProDOS">ProDOS</a></li>
</ul>
<h3 id="drawbacks-1">Drawbacks</h3>
<ul>
<li>Programs expect to use the entire machine, because that’s how programs were run previously and that’s what the programmers were used to. The operating system must therefore isolate programs from each other (in the simplest/earliest cases, by running each job to completion or termination before loading the next).</li>
</ul>
<p><a name="Interactivity"></a></p>
<h2 id="interactivity">3. Interactivity <a href="#Interactivity">#</a></h2>
<p><strong>Year</strong>: 1958</p>
<h3 id="archetype-2">Archetype</h3>
<p>The <a href="http://en.wikipedia.org/wiki/TX-0">TX-0</a> machine, one of the first transistorized computers, was installed at MIT in summer of 1958. The TX-0 had a monitor (a 512x512 CRT display), a keyboard, and a pointing device (a <a href="http://en.wikipedia.org/wiki/Light_pen">light pen</a>), making it probably the first computer with <a href="http://youtu.be/ieuV0A01--c?t=2m41s">a physical interface that we might recognize today</a>. It also happens to be the machine which spawned <a href="http://en.wikipedia.org/wiki/Hackers:_Heroes_of_the_Computer_Revolution#Part_One:_True_Hackers">hacker culture</a>.</p>
<h3 id="motivation-2">Motivation</h3>
<p>The TX-0 was a scaled-down (transistorized) offshoot of an Air Force project called <a href="http://en.wikipedia.org/wiki/Semi-Automatic_Ground_Environment">SAGE</a>, with the ambitious goal of an electronic, automated, networked missile defense and early warning radar system. The development of interactive display computing had three main causes in this context:</p>
<ul>
<li>it was a natural successor to the analog <a href="http://en.wikipedia.org/wiki/Radar_display">radar display</a></li>
<li>the on-line nature of the task demanded real-time human interaction</li>
<li>the importance of the task meant that funding was no object, so an entire computer (in fact, the largest and most expensive computer system ever made) could be “wasted” on providing such interactivity</li>
</ul>
<p>Because of its transistorized circuitry, the TX-0 needed very little maintenance or oversight, and for years was left unattended at MIT for pretty much anybody to use at any time, resulting in a great flourishing of interactive programs (many of whose names began with the word “Expensive,” in an acknowledgment of the absurdity of a $3M machine being available for such experimentation).</p>
<h3 id="concept-2">Concept</h3>
<p><strong>An interactive program is one which consumes input after producing output.</strong> Prior to SAGE, once a program produced its output, it was done, and the machine would halt or move on to the next job. What distinguishes an interactive system is that it will produce some output and then <em>wait</em> until more input is available.</p>
<h3 id="benefits-2">Benefits</h3>
<ul>
<li>It became possible to do creative work at a computer.</li>
</ul>
<h3 id="exemplars-1">Exemplars</h3>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Sketchpad">Sketchpad</a></li>
<li><a href="http://en.wikipedia.org/wiki/VisiCalc">VisiCalc</a></li>
<li><a href="http://en.wikipedia.org/wiki/GNU_Emacs">Emacs</a></li>
</ul>
<h3 id="drawbacks-2">Drawbacks</h3>
<ul>
<li>“Waiting” is poorly specified. If a program is waiting for one kind of input, what if a different kind of input arrives instead? It will fail to respond until the kind of input it was expecting appears. This problem continues to crop up in graphics programming, network programming, and other areas.</li>
</ul>
<p><a name="Transactions"></a></p>
<h2 id="transactions">4. Transactions <a href="#Transactions">#</a></h2>
<p><strong>Year</strong>: 1959</p>
<h3 id="archetype-3">Archetype</h3>
<p>Before computerization, American Airlines’ booking process was labor-intensive and slow. IBM realized that the basic idea behind SAGE could be applied to solve the airline reservation problem, resulting in <a href="http://en.wikipedia.org/wiki/Sabre_(computer_system)#History">SABRE</a>. The core of the SABRE operating system later became known as TPF (Transaction Processing Facility).</p>
<h3 id="motivation-3">Motivation</h3>
<p>American wanted a system with 1,500 booking terminals across the US and Canada all linked by modem to a central reservations computer. But what if two terminals try to book the last seat on a flight at the same moment? A system like this needs strong guarantees on consistency.</p>
<h3 id="concept-3">Concept</h3>
<p><strong>Transactions are operations each guaranteed either to fail without any effect, or to run in a definite, strict order.</strong> Lots of terminals may attempt to input transactions, but every terminal must observe the same consistent state of the system, including a global <a href="http://en.wikipedia.org/wiki/Transaction_log">transaction log</a> listing each transaction in the precise order in which it was applied.</p>
<h3 id="benefits-3">Benefits</h3>
<ul>
<li>This one core idea enabled the development of systems called <strong>databases</strong>, which can reliably maintain the state of complex data structures across incessant read and write operations as well as some level of hardware failures.</li>
<li>Modern <strong>filesystems</strong> are “journaled”, which means that they implement transactions.</li>
<li>Transactions are also the key idea behind <strong>version control systems</strong>, which are increasingly adopted in all corners of the software world. In that context, they are called “commits”.</li>
<li>Most recently, the core of crypto-currencies is a crude but clever solution to a distributed transaction processing problem. (In this context, transactions are in fact called transactions.)</li>
</ul>
<h3 id="exemplars-2">Exemplars</h3>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Ingres_(database)">Ingres</a></li>
<li><a href="http://en.wikipedia.org/wiki/ZFS">ZFS</a></li>
<li><a href="http://en.wikipedia.org/wiki/Git_(software)">git</a></li>
<li><a href="https://github.com/ethereum/wiki/wiki/%5BEnglish%5D-White-Paper#wiki-basic-building-blocks">ethereum</a></li>
</ul>
<h3 id="drawbacks-3">Drawbacks</h3>
<ul>
<li>Trades performance for correctness. In some contexts, an occasional incorrect result is not as much of a problem as overall throughput.</li>
</ul>
<p><a name="Garbage-Collection"></a></p>
<h2 id="garbage-collection">5. Garbage Collection <a href="#Garbage-Collection">#</a></h2>
<p><strong>Year</strong>: 1960</p>
<h3 id="archetype-4">Archetype</h3>
<p>All garbage-collected environments owe a debt <a href="http://www-formal.stanford.edu/jmc/recursive/node4.html">to Lisp</a>, the first to provide such a facility.</p>
<h3 id="motivation-4">Motivation</h3>
<p>Previously, programs required the manual management of the memory resource; the programmer had to anticipate when the program would need access to more memory, and ensure that the program wouldn’t consume all the memory on the machine by not re-using memory locations that hold no-longer-needed data.</p>
<h3 id="concept-4">Concept</h3>
<p><strong>A garbage collector (GC) is a piece of software which maintains a data structure representing available memory, and marks a given memory location as available whenever it is no longer being referred to.</strong></p>
<h3 id="benefits-4">Benefits</h3>
<ul>
<li>The programmer doesn’t have to think about allocating and deallocating memory in order to make a working program.</li>
</ul>
<h3 id="exemplars-3">Exemplars</h3>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Genera_(operating_system)">Genera</a></li>
<li><a href="http://wiki.luajit.org/New-Garbage-Collector">LuaJIT</a></li>
</ul>
<h3 id="drawbacks-4">Drawbacks</h3>
<ul>
<li>Performance becomes unpredictable due to variable GC pause times<a href="#fn6" class="footnoteRef" id="fnref6"><sup>6</sup></a>.</li>
<li>Memory usage becomes unpredictable due to variable GC effectiveness and potential reference leaks.</li>
</ul>
<p><a name="Virtualization"></a></p>
<h2 id="virtualization">6. Virtualization <a href="#Virtualization">#</a></h2>
<p><strong>Year</strong>: 1961</p>
<h3 id="archetype-5">Archetype</h3>
<p>The <a href="http://www.computer.org/csdl/proceedings/afips/1961/5059/00/50590279.pdf">Atlas Supervisor</a>, developed at the University of Manchester in 1961, has been called “the first recognizable modern operating system” and “the most significant breakthrough in the history of operating systems”<a href="#fn7" class="footnoteRef" id="fnref7"><sup>7</sup></a>.</p>
<h3 id="motivation-5">Motivation</h3>
<p>System builders wanted the capability to run multiple programs at once, mostly for the following reason:</p>
<blockquote>
<p>Whilst one program is halted, awaiting completion of a magnetic tape transfer for instance, the coordinator routine switches control to the next program in the object program list which is free to proceed.</p>
</blockquote>
<p>However, as mentioned earlier, programs were (and still are!) written in such a way as to assume they have a machine all to themselves. Thus, to bridge the gap, we need to provide such programs with a “virtual” environment which they <em>do</em> have all to themselves.</p>
<h3 id="concept-5">Concept</h3>
<p><strong>Virtualization is a general term for software facilities (possibly supported by hardware acceleration) to run programs as if they each have a computer all to themselves.</strong> Common forms include:</p>
<ul>
<li><strong>Virtual memory</strong> is a mechanism to translate “virtual” addresses into fetch commands against physical data stores, in such a way that each program has a whole “virtual” computer to itself, despite sharing physical memory.</li>
<li>A <strong>virtual machine (VM)</strong> is a relatively fast bytecode interpreter which does not enable programs to directly execute instructions on the physical machine.</li>
<li>In <strong>full virtualization</strong>, a virtual machine exposes the entire host machine instruction set, thus enabling native programs to run within a VM.</li>
</ul>
<h3 id="benefits-5">Benefits</h3>
<ul>
<li>Virtual memory makes it possible to only copy data from slow tiers of storage into fast tiers of storage if and when that “page” of data is needed.</li>
<li>Virtual memory makes it possible to persist data directly from volatile storage into nonvolatile storage “in the background,” without special handling.</li>
<li>Virtual memory makes it possible for processes to “share” memory without out-of-band communication.</li>
<li>VMs have relatively strong security guarantees; because all programs become paths through an interpreter, one need only show that the interpreter is safe to confirm that running arbitrary code within the VM is safe.</li>
</ul>
<h3 id="exemplars-4">Exemplars</h3>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Multics">Multics</a> (<a href="http://users.soe.ucsc.edu/~sbrandt/221/Papers/History/daley-cacm68.pdf">virtual memory</a>)</li>
<li><a href="http://www.slideshare.net/jserv/plan-9-not-only-a-better-unix">Plan 9</a> (unparalleled uniformity between volatile, nonvolatile, and network storage)</li>
<li><a href="http://en.wikipedia.org/wiki/Xen">Xen</a> (full virtualization)</li>
<li><a href="http://luajit.org/luajit.html">LuaJIT</a> (<a href="http://nominolo.blogspot.co.uk/2012/07/implementing-fast-interpreters.html">VM</a>)</li>
</ul>
<h3 id="drawbacks-5">Drawbacks</h3>
<ul>
<li>Virtual memory tries so hard to stay out of the programmer’s way that most programmers don’t even have a clear idea of what it is. As a result, its capabilities tend to be underused.</li>
<li>Virtual memory should have been extended to network resources, but this has not really happened.</li>
<li>As usually implemented, virtual memory subtly encourages the development of programs that do not talk to each other, because they are all pretending to exist in an isolated virtual memory space.</li>
</ul>
<p><a name="Hypermedia"></a></p>
<h2 id="hypermedia">7. Hypermedia <a href="#Hypermedia">#</a></h2>
<p><strong>Year</strong>: 1968</p>
<h3 id="archetype-6">Archetype</h3>
<p>Doug Engelbart’s <a href="http://en.wikipedia.org/wiki/NLS_(computer_system)">NLS</a> introduced implementations of:</p>
<ul>
<li>hypertext links</li>
<li>markup language</li>
<li>document version control</li>
<li>videoconferencing</li>
<li>email with hypermedia</li>
<li>hypermedia publishing</li>
<li>flexible windowing modes</li>
</ul>
<h3 id="motivation-6">Motivation</h3>
<blockquote>
<p><strong><a href="http://www.dougengelbart.org/pubs/augment-3906.html">Augmenting Human Intellect</a></strong></p>
<p>By “augmenting human intellect” we mean increasing the capability of a man to approach a complex problem situation, to gain comprehension to suit his particular needs, and to derive solutions to problems. Increased capability in this respect is taken to mean a mixture of the following: more-rapid comprehension, better comprehension, the possibility of gaining a useful degree of comprehension in a situation that previously was too complex, speedier solutions, better solutions, and the possibility of finding solutions to problems that before seemed insoluble. And by “complex situations” we include the professional problems of diplomats, executives, social scientists, life scientists, physical scientists, attorneys, designers–whether the problem situation exists for twenty minutes or twenty years. We do not speak of isolated clever tricks that help in particular situations. We refer to a way of life in an integrated domain where hunches, cut-and-try, intangibles, and the human “feel for a situation” usefully co-exist with powerful concepts, streamlined terminology and notation, sophisticated methods, and high-powered electronic aids.</p>
<p>Existing, or near-future, technology could certainly provide our professional problem-solvers with the artifacts they need to have for duplicating and rearranging text before their eyes, quickly and with a minimum of human effort. Even so apparently minor an advance could yield total changes in an individual’s repertoire hierarchy that would represent a great increase in over-all effectiveness. Normally the necessary equipment would enter the market slowly; changes from the expected would be small, people would change their ways of doing things a little at a time, and only gradually would their accumulated changes create markets for more radical versions of the equipment. Such an evolutionary process has been typical of the way our repertoire hierarchies have grown and formed.</p>
<p>But an active research effort, aimed at exploring and evaluating possible integrated changes throughout the repertoire hierarchy, could greatly accelerate this evolutionary process.</p>
</blockquote>
<h3 id="concept-6">Concept</h3>
<p><strong>Hypermedia refers to any communications medium which comprises interactive systems.</strong> The most popular forms of hypermedia are those employing <strong>hyperlinks</strong>: certain elements of a viewed object which can be activated through interaction and whose activation triggers the display of a different object, which is determined by the hyperlink and possibly also by the interaction. For example, the World Wide Web is a form of hypermedia (hypertext), though even HTML5 is not nearly as capable as hypermedia pioneers like Ted Nelson and Doug Engelbart had probably hoped.</p>
<h3 id="benefits-6">Benefits</h3>
<ul>
<li>Makes nonlinear communication/expression much easier</li>
<li>A continuum between hypermedia authoring and program authoring eases more people into being able to craft programs to solve their own problems, which is good for freedom</li>
<li>Could enable people to organize their own thoughts and lives more elegantly and smoothly</li>
</ul>
<h3 id="exemplars-5">Exemplars</h3>
<ul>
<li><a href="http://en.wikipedia.org/wiki/HyperCard">HyperCard</a></li>
<li><a href="http://twinery.org/">Twine</a></li>
<li><a href="http://en.wikipedia.org/wiki/Wikipedia">Wikipedia</a></li>
</ul>
<h3 id="drawbacks-6">Drawbacks</h3>
<ul>
<li>It’s easy to implement bad hypermedia, like HTML.</li>
<li>If a software company makes good enough hypermedia, like <a href="http://en.wikipedia.org/wiki/HyperCard">HyperCard</a>, it will be quickly discontinued since it will threaten the rest of the company’s product line.</li>
</ul>
<p><a name="Internetworking"></a></p>
<h2 id="internetworking">8. Internetworking <a href="#Internetworking">#</a></h2>
<p><strong>Year</strong>: 1969</p>
<h3 id="archetype-7">Archetype</h3>
<p><a href="http://en.wikipedia.org/wiki/Arpanet">ARPAnet</a> is the quintessential computer network. It was originally called “the Intergalactic Computer Network” and ultimately became known as simply “the Internet”.</p>
<h3 id="motivation-7">Motivation</h3>
<blockquote>
<p>We had in my office three terminals to three different programs that ARPA was supporting. One was to the Systems Development Corporation in Santa Monica. There was another terminal to the Genie Project at U.C. Berkeley. The third terminal was to the C.T.S.S. project that later became the Multics project at M.I.T.</p>
<p>The thing that really struck me about this evolution was how these three systems caused communities to get built. People who didn’t know one another previously would now find themselves using the same system. Because the systems allowed you to share files, you could find that so-and-so was interested in such-and-such and he had some data about it. You could contact him by e-mail and, lo and behold, you would have a whole new relationship.</p>
<p>It wasn’t a static medium. It was a dynamic medium. And that gave it a lot of power.</p>
<p>There was one other trigger that turned me to the ARPAnet. For each of these three terminals, I had three different sets of user commands. So if I was talking online with someone at S.D.C. and I wanted to talk to someone I knew at Berkeley or M.I.T. about this, I had to get up from the S.D.C. terminal, go over and log into the other terminal and get in touch with them.</p>
<p>I said, oh, man, it’s obvious what to do: If you have these three terminals, there ought to be one terminal that goes anywhere you want to go where you have interactive computing. That idea is the ARPAnet.</p>
<p>–<a href="http://en.wikipedia.org/wiki/Robert_Taylor_(computer_scientist)">Bob Taylor</a> (<a href="http://partners.nytimes.com/library/tech/99/12/biztech/articles/122099outlook-bobb.html">source</a>), ARPA IPTO director</p>
</blockquote>
<h3 id="concept-7">Concept</h3>
<p><strong>An internetwork is a set of communications channels between computers, where each computer is running a service that routes incoming messages to some other communications channel, so that each message eventually reaches its addressee.</strong> “Messages,” in this context, are generally termed “packets” (and they generally reach their destination within less than a hundred “hops”).</p>
<h3 id="benefits-7">Benefits</h3>
<ul>
<li>Global instant email</li>
<li>Global instant hypertext</li>
<li>Global database-backed applications</li>
<li>Global file sharing</li>
</ul>
<h3 id="exemplars-6">Exemplars</h3>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Internet_protocol">Internet Protocol</a></li>
</ul>
<h3 id="drawbacks-7">Drawbacks</h3>
<ul>
<li>Classical internetworking has no built-in economic component; arrangements between large networks must be negotiated “out of band” and encoded in a rather nasty form called <a href="http://en.wikipedia.org/wiki/Border_Gateway_Protocol">BGP</a>. As a result of this, individual people or even moderately large corporations usually cannot internetwork, but must instead purchase access to the Internet. As a result of <em>this</em>, most communications systems around the world are controlled by unjust oligopolies, with high barriers to competition and low barriers to various abuses of power.</li>
</ul>
<h2 id="conclusion">Conclusion</h2>
<p>I find that all the significant concepts in software systems were invented/discovered in the 15 years between 1955 and 1970. What have we been doing since then? Mostly making things faster, cheaper, more memory-consuming, smaller, cheaper, dramatically less efficient, more secure<a href="#fn8" class="footnoteRef" id="fnref8"><sup>8</sup></a>, and worryingly glitchy. And we’ve been rehashing the same ideas over and over again. Interactivity is now “event-driven programming”. Transactions are now “concurrency primitives”. Internetworking is now “mesh networking”. Also, we have tabbed browsing now, because overlapping windows were a bad skeuomorphism from the start, and desktop notifications, because whatever is all the way in the corner of your screen is probably not very important. “Flexible view control” is relegated to the few and the proud who run something like <code>xmonad</code> or <code>herbstluftwm</code> on their custom-compiled GNU/Linux.</p>
<p>Many good programs have been written. Lots of really important algorithms and data structures have been invented (though usually not implemented in practice). Hardware has made <em>so</em> much progress. In the 1960s, a lot of good ideas were tossed out because they ran too slow, but here in 2014 everything is written in Python anyway, so let’s bring back the good old days, but now with Retina screens and multi-core gigahertz processors and tens of gigabytes of core memory. Let’s take that 20% performance hit over hand-coded assembler that was unacceptable in the 1960s, because it’s a 10x improvement over what we’re doing now.</p>
<p>Most of all, let’s rethink the received wisdom that you should teach your computer to do things in a programming language and run the resulting program on an operating system. A righteous operating system should be a programming language. And for goodness’ sake, let’s not use the entire network stack just to talk to another process on the same machine which is responsible for managing a database using the filesystem stack. At least let’s use shared memory (with transactional semantics, naturally – which Intel’s latest CPUs support in hardware). But if we believe in the future – if we believe in ourselves – let’s dare to ask why, anyway, does the operating system give you this “filesystem” thing that’s no good as a database and expect you to just accept that “stuff on computers goes in folders, lah”? Any decent software environment ought to have a fully featured database, built in, and no need for a “filesystem”.</p>
<p>Reject the notion that one program talking to another should have to invoke some “input/output” API. You’re the human, and you <em>own</em> this machine. You get to say who talks to what when, why, and how if you please. All this software stuff we’re expected to deal with – files, sockets, function calls – was just invented by other mortal people, like you and I, without using any tools we don’t have the equivalent of fifty thousand of. Let’s do some old-school hacking on our new-school hardware – like the original TX-0 hackers, in assembly, from the ground up – and work towards a harmonious world where there is something new in software systems for the first time since 1969.</p>
<hr />
<p><em>To be continued…</em></p>
<section class="footnotes">
<hr />
<ol>
<li id="fn1"><p>Since then, Smalltalk (<a href="http://squeaknos.blogspot.com/">SqueakNOS</a>), Forth (<a href="http://www.colorforth.com/cf.htm">colorForth</a>), and Lisp (<a href="http://en.wikipedia.org/wiki/Genera_(operating_system)">Genera</a>) have all flirted with becoming operating systems, and <a href="http://en.wikipedia.org/wiki/Oberon_(programming_language)">Oberon</a> was <a href="http://www.inf.ethz.ch/personal/wirth/ProjectOberon/PO.System.pdf">designed</a> to <a href="http://en.wikipedia.org/wiki/Oberon_(operating_system)">be one</a> from the start. But none achieved economic success, for the simple reason that none of the projects involved attempted to provide value to people. They solved technical problems to validate that their concepts can work in the real world, but did not pursue the delivery of better solutions to real-world problems than would otherwise be possible.<a href="#fnref1">↩</a></p></li>
<li id="fn2"><p>Serious embedded systems people who write machine code from scratch, this is your time to gloat. You truly deserve the title of engineer. In fact, chances are good that you hold the title “electrical engineer”. Chances are also good that whatever you engineer isn’t computers, so hear me out. On the off-chance that you are an embedded systems person who writes machine code from scratch and you <strong>do</strong> make computers or computer parts, chances are good that you are (a) the bane of some free software driver author’s existence, and/or (b) providing an incredibly hard-to-detect hideout for really clever malware. <strong>Please compel your employers to publish technical documentation freely and to use ROMs in place of FLASH so that malware can’t take over your lovingly crafted code.</strong> Now, back to our regularly scheduled tirade.<a href="#fnref2">↩</a></p></li>
<li id="fn3"><p>Yes, there are exceptions, but they’re not the ones you think. The exceptions are those derived from the work of <a href="http://en.wikipedia.org/wiki/Cliff_Shaw">Cliff Shaw</a> (e.g. <a href="http://en.wikipedia.org/wiki/PLANNER">PLANNER</a>, <a href="http://en.wikipedia.org/wiki/Prolog">Prolog</a>, <a href="http://en.wikipedia.org/wiki/MUMPS">M</a>), those derived from <a href="http://c2.com/cgi/wiki?AplLanguage">APL</a> (e.g. <a href="http://c2.com/cgi/wiki?JayLanguage">J</a>, <a href="http://c2.com/cgi/wiki?KayLanguage">K</a>, and <a href="https://www.princeton.edu/~hos/mike/transcripts/mcilroy.htm">arguably</a> the UNIX shell/pipeline environment), the <a href="http://www.mt-archive.info/MT-1958-Yngve.pdf">COMIT</a> family (e.g. <a href="http://c2.com/cgi/wiki?SnobolLanguage">SNOBOL</a>), and the curious corner case of <a href="https://gist.github.com/koo5/4129213">Inform 7</a>. Lisp was inspired by FORTRAN (<a href="http://www-formal.stanford.edu/jmc/history/lisp/node2.html">source</a>). ISWIM (which some programming language histories identify as the “root” of the ML family) is based on ALGOL 60 (<a href="http://www.cs.cmu.edu/~crary/819-f09/Landin66.pdf">source</a>), which of course is based on FORTRAN. The <a href="http://www.forth.com/resources/evolution/evolve_0.html">Forth</a> family (e.g. <a href="http://en.wikipedia.org/wiki/PostScript">PostScript</a>, <a href="http://c2.com/cgi/wiki?FactorLanguage">Factor</a>, <a href="http://www.stanford.edu/~ouster/cgi-bin/papers/tcl-usenix.pdf">Tcl</a> via <a href="http://c2.com/cgi/wiki?NetworkExtensibleWindowSystem">NeWS</a>) was rooted in Lisp (<a href="http://www.colorforth.com/HOPL.html">source</a>). Even COMIT was loosely inspired by FORTRAN (<a href="http://books.google.com/books?id=-GW8lOYl3AAC&amp;lpg=PA53&amp;ots=OzMYwEw0kz&amp;dq=COMIT+FORTRAN&amp;pg=PA53">source</a>). Some <strong>esolangs</strong> (“esoteric languages”, viz. languages not intended for serious use) like <a href="http://en.wikipedia.org/wiki/Befunge">Befunge</a> and <a href="http://c2.com/cgi/wiki?TheWierdLanguage">Wierd</a> are very much non-FORTRAN, but they are not seriously used by anyone. Machine code could simply be disqualified on the basis that it is not software (the subject of this article), but even all current machine languages feature stack operators, which derive from ALGOL via <a href="http://research.microsoft.com/en-us/um/people/gbell/computer_structures_principles_and_examples/csp0260.htm">Burroughs Large Systems</a>.<a href="#fnref3">↩</a></p></li>
<li id="fn4"><p>Yes, I’m aware of <a href="http://en.wikipedia.org/wiki/Graphical_programming">all this $#!*</a>. If you want to point out that graphical programming languages exist, and they aren’t based on FORTRAN, well, they fall outside my definition of “programming language”, so there. Riddle me this: why does nobody who knows how to program in text ever want to use them? Why do they break down for anything that isn’t basically a signal processing task? Why don’t they have lambdas, zooming, or style? You know, style. Like Edward Tufte has. Style. Nobody wants to use an ugly visual programming language.<a href="#fnref4">↩</a></p></li>
<li id="fn5"><p>These are examples of <strong>non-multitasking</strong> OSes. Multitasking (as practiced today) requires a separate idea, which I cover in the section marked <a href="#Virtualization">Virtualization</a>.<a href="#fnref5">↩</a></p></li>
<li id="fn6"><p>This disadvantage can be mitigated significantly (or, with great effort, completely eliminated) by the careful use of <a href="http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)#Stop-the-world_vs._incremental_vs._concurrent">incremental or concurrent</a> garbage collectors.<a href="#fnref6">↩</a></p></li>
<li id="fn7"><p>(<a href="http://books.google.com/books?id=-PDPBvIPYBkC&amp;lpg=PA1&amp;ots=LgbTZ3Z1IO&amp;dq=Classic%20Operating%20Systems%3A%20From%20Batch%20Processing%20to%20Distributed%20Systems&amp;pg=PA7#v=onepage&amp;q=atlas&amp;f=false">source</a>)<a href="#fnref7">↩</a></p></li>
<li id="fn8"><p>I considered including cryptography as another major bullet point, but if I selected a particular algorithm (e.g. <a href="http://en.wikipedia.org/wiki/Diffie-Hellman_key_exchange">Diffie-Hellman-Merkle key exchange</a> or the <a href="http://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction">Merkle–Damgård hash function construction</a>), then I’d have to include other important algorithms (I’ve left algorithms out of my list here; they’re not “software innovations” in the sense I mean), and if I selected “encrypted communications”, well, that surely predates computers. The fact that people started writing programs which encrypt communications is great, but it doesn’t change the software environment on the level I’m talking about. That said, I do consider the invention of the practical cryptographic hash a contender for most important innovation in <em>computer science</em> in the last 25 years; asymmetric cryptography is about equally important. Ralph Merkle really deserves more credit for having essentially conceived of both pillars of modern cryptography.<a href="#fnref8">↩</a></p></li>
</ol>
</section>]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[How I Think About Math, <br/>Lecture 1: Relations]]></title>
    <link href="http://davidad.github.io/blog/2014/03/10/how-i-think-about-math-relations/"/>
    <updated>2014-03-10T11:05:40-04:00</updated>
    <id>http://davidad.github.io/blog/2014/03/10/how-i-think-about-math-relations</id>
    <content type="html"><![CDATA[<p><a href="http://davidad.github.io/assets/20140306.pdf">See the slides (PDF)</a>. (You may want to use your PDF viewer’s presentation mode; there are a lot of pseudo-animations that could get annoying to scroll through.)</p>
<p><strong>Update</strong>: Today, I drew up the <a href="http://davidad.github.io/assets/20140310.pdf">field axioms</a> in this notation. I’m almost to the point where I can define linearity!</p>
<hr />
<p>Last week at <a href="http://www.hackerschool.com">Hacker School</a>, I floated the idea of giving a presentation about linear algebra. Over a decade after taking it in college, I finally feel like I understand linear algebra well enough to express clearly, to an audience of programmers, most of the concepts from linear algebra that they might find useful.</p>
<p>I figured the very first thing to present would be the concept of <em>linearity</em> itself. After all, a <strong>linear operator</strong> is just any operator that commutes with addition and scalar multiplication. But wait– what is “commuting”? <!-- more
--> Well, no problem, “<em>A</em> and <em>B</em> commute” just means that composing <em>A</em> with <em>B</em> yields the same operator as composing <em>B</em> with <em>A</em>. But wait– what is “composing”? I <em>could</em> start my presentation by defining a category, but that would be unnecessarily scary given category theory’s fearsome reputation. Besides, John Baez <a href="http://johncarlosbaez.wordpress.com/2014/03/02/network-theory-i/">showed me last week</a> that categorical diagram notation has its boxes and arrows counterintuitively swapped. But wait– I could just use Baez’s new notation, instead! Then my entire discussion of linear algebra will be based on concrete, non-fearsome <strong>relations</strong>, instead of “morphisms.”</p>
<p>So…I got about as far as defining “commuting.” (Linear algebra will have to wait.) * * *</p>
<p>Note: I’m skirting the edge of what Baez’s formalism actually allows; in his work so far, diagrams always depict morphisms, rather than logical assertions. I’m still working on the semantics of quantifiers in this notation, so it’s conceivable some of the examples in these slides will change as I learn more.</p>]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Python to Scheme to Assembly, <br>Part 1: Recursion and Named Let]]></title>
    <link href="http://davidad.github.io/blog/2014/02/28/python-to-scheme-to-assembly-1/"/>
    <updated>2014-02-28T14:43:58-05:00</updated>
    <id>http://davidad.github.io/blog/2014/02/28/python-to-scheme-to-assembly-1</id>
    <content type="html"><![CDATA[<p><em>In 2001, my favorite programming language was Python. In 2008, my favorite programming language was Scheme. In 2014, my favorite programming language is x64 assembly. For some reason, that progression tends to surprise people. Come on a journey with me.</em></p>
<h2 id="python">Python</h2>
<p>In this article, we’re going to consider a very simple toy problem: recursively summing up a list of numbers<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a>.</p>
<figure class='code'><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='1' class='line-number'></div><div data-line='2' class='line-number'></div><div data-line='3' class='line-number'></div><div data-line='4' class='line-number'></div><div data-line='5' class='line-number'></div></pre></td><td class='main  python'><pre><div class='line'><code><span class="k">def</span> <span class="nf">sum_list</span><span class="p">(</span><span class="nb">list</span><span class="p">):</span>
</code></div><div class='line'><code>  <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="nb">list</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
</code></div><div class='line'><code>    <span class="k">return</span> <span class="mi">0</span>
</code></div><div class='line'><code>  <span class="k">else</span><span class="p">:</span>
</code></div><div class='line'><code>    <span class="k">return</span> <span class="nb">list</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">+</span><span class="n">sum_list</span><span class="p">(</span><span class="nb">list</span><span class="p">[</span><span class="mi">1</span><span class="p">:])</span>
</code></div></pre></td></tr></table></div></figure>


<blockquote>
<pre><code> &gt;&gt;&gt; sum_list(range(101))
 5050</code></pre>
</blockquote>
<p><a href="http://en.wikipedia.org/wiki/Carl_Friedrich_Gauss#Anecdotes">Young Carl Gauss</a> would be proud.</p>
<blockquote>
<pre><code> &gt;&gt;&gt; sum_list(range(1001))
 RuntimeError: maximum recursion depth exceeded</code></pre>
<p>Oops.</p>
</blockquote>
<p>Young programmers often learn from this type of experience that recursion <em>sucks</em>. (Or, as a modern young programmer might say, it <em>doesn’t scale</em>.) If they Google around a bit, they might find the following “solution”: <!-- more --></p>
<blockquote>
<pre><code> &gt;&gt;&gt; import sys
 &gt;&gt;&gt; sys.setrecursionlimit(1500)
 &gt;&gt;&gt; sum_list(range(1001))
 500500</code></pre>
</blockquote>
<p>If they have a good computer science teacher, though, they’ll learn that the real solution is to use something called <strong>tail recursion</strong>. This is a somewhat mysterious, seemingly arbitrary concept. If the result of your recursive call gets returned <em>immediately</em>, without any intervening expessions, then somehow it “doesn’t count” toward the equally arbitrary recursion depth limit. Our example above isn’t tail-recusrive because we add <code>list[0]</code> to <code>sum_list(list[1:])</code> before returning the result. In order to make <code>sum_list</code> tail-recursive, we have to add an <strong>accumulator</strong> variable, which represents the sum of those numbers we’ve looked at already. We’ll call this version <code>sum_sublist</code>, and wrap it in a new <code>sum_list</code> function which calls <code>sum_sublist</code> with the initial accumulator 0 (initially, we haven’t looked at any numbers yet, so the sum of them is 0).</p>
<figure class='code'><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='1' class='line-number'></div><div data-line='2' class='line-number'></div><div data-line='3' class='line-number'></div><div data-line='4' class='line-number'></div><div data-line='5' class='line-number'></div><div data-line='6' class='line-number'></div><div data-line='7' class='line-number'></div></pre></td><td class='main  python'><pre><div class='line'><code><span class="k">def</span> <span class="nf">sum_list</span><span class="p">(</span><span class="nb">list</span><span class="p">):</span>
</code></div><div class='line'><code>  <span class="k">def</span> <span class="nf">sum_sublist</span><span class="p">(</span><span class="n">accum</span><span class="p">,</span><span class="n">sublist</span><span class="p">):</span>
</code></div><div class='line'><code>    <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">sublist</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
</code></div><div class='line'><code>      <span class="k">return</span> <span class="n">accum</span>
</code></div><div class='line'><code>    <span class="k">else</span><span class="p">:</span>
</code></div><div class='line'><code>      <span class="k">return</span> <span class="n">sum_sublist</span><span class="p">(</span><span class="n">accum</span><span class="o">+</span><span class="n">sublist</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="n">sublist</span><span class="p">[</span><span class="mi">1</span><span class="p">:])</span>
</code></div><div class='line'><code>  <span class="k">return</span> <span class="n">sum_sublist</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="nb">list</span><span class="p">)</span>
</code></div></pre></td></tr></table></div></figure>


<blockquote>
<pre><code> &gt;&gt;&gt; sum_list(range(101))
 5050</code></pre>
</blockquote>
<p>So far, so good.</p>
<blockquote>
<pre><code> &gt;&gt;&gt; sum_list(range(1001))
 RuntimeError: maximum recursion depth exceeded</code></pre>
</blockquote>
<p>Wait, what?</p>
<blockquote>
<p>On Wednesday, April 22, 2009, Guido van Rossum <a href="http://neopythonic.blogspot.co.uk/2009/04/tail-recursion-elimination.html">wrote</a>: &gt; A side remark about not supporting tail recursion elimination (TRE) &gt; immediately sparked several comments about what a pity it is that Python &gt; doesn’t do this, including links to recent blog entries by others trying to &gt; “prove” that TRE can be added to Python easily. So let me defend my position &gt; (which is that I don’t want TRE in the language). If you want a short &gt; answer, it’s simply unpythonic. Here’s the long answer:</p>
<p><em>[snipped]</em></p>
<blockquote>
<p>Third, I don’t believe in recursion as the basis of all programming. This is a fundamental belief of certain computer scientists, especially those who love Scheme…</p>
</blockquote>
<p><em>[snipped]</em></p>
<blockquote>
<p>Still, if someone was determined to add TRE to CPython, they could modify the compiler roughly as follows…</p>
</blockquote>
</blockquote>
<p>In other words, the <em>only</em> reason this doesn’t work is that Guido van Rossum<a href="#fn2" class="footnoteRef" id="fnref2"><sup>2</sup></a> <em>prefers it that way</em>. Guido, I respect your right to your opinion, but the reader and I are switching to Scheme.</p>
<h2 id="scheme">Scheme</h2>
<p>Here’s a line-by-line translation:</p>
<figure class='code'><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='1' class='line-number'></div><div data-line='2' class='line-number'></div><div data-line='3' class='line-number'></div><div data-line='4' class='line-number'></div><div data-line='5' class='line-number'></div><div data-line='6' class='line-number'></div><div data-line='7' class='line-number'></div></pre></td><td class='main  scheme'><pre><div class='line'><code><span class="p">(</span><span class="k">define </span><span class="p">(</span><span class="nf">sum_list</span> <span class="nv">list</span><span class="p">)</span>
</code></div><div class='line'><code>  <span class="p">(</span><span class="k">define </span><span class="p">(</span><span class="nf">sum_sublist</span> <span class="nv">accum</span> <span class="nv">sublist</span><span class="p">)</span>
</code></div><div class='line'><code>    <span class="p">(</span><span class="k">cond </span><span class="p">((</span><span class="nb">null? </span><span class="nv">sublist</span><span class="p">)</span>                 <span class="c1">; tests if sublist has length 0</span>
</code></div><div class='line'><code>           <span class="nv">accum</span> <span class="p">)</span>                         <span class="c1">; don&#39;t need return statement in Scheme</span>
</code></div><div class='line'><code>          <span class="p">(</span><span class="nf">else</span>
</code></div><div class='line'><code>           <span class="p">(</span><span class="nf">sum_sublist</span> <span class="p">(</span><span class="nb">+ </span><span class="nv">accum</span> <span class="p">(</span><span class="nb">car </span><span class="nv">sublist</span><span class="p">))</span> <span class="p">(</span><span class="nb">cdr </span><span class="nv">sublist</span><span class="p">))</span> <span class="p">)))</span>
</code></div><div class='line'><code>  <span class="p">(</span><span class="nf">sum_sublist</span> <span class="mi">0</span> <span class="nv">list</span><span class="p">)</span> <span class="p">)</span>
</code></div></pre></td></tr></table></div></figure>


<blockquote>
<pre><code> guile&gt; (sum_list (iota 1001))
 500500</code></pre>
</blockquote>
<p>Phew! Let’s make sure that we aren’t just getting lucky with a bigger recursion limit:</p>
<blockquote>
<pre><code> guile&gt; (sum_list (iota 10000001))
 50000005000000</code></pre>
</blockquote>
<p>Well, isn’t that neat? If we go much bigger, it’ll take a long time, but as long as the output fits into memory, we’ll get the right answer<a href="#fn3" class="footnoteRef" id="fnref3"><sup>3</sup></a>.</p>
<h3 id="named-let">Named Let</h3>
<p>In our last two versions of <code>sum_list</code>, we defined a helper function (<code>sum_sublist</code>), and the rest of the body of <code>sum_list</code> was just a single invocation of that helper function. This is an inelegant pattern<a href="#fn4" class="footnoteRef" id="fnref4"><sup>4</sup></a>, which Scheme has a construct to address.</p>
<a name="named-let-1"></a>
<figure class='code'><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='1' class='line-number'></div><div data-line='2' class='line-number'></div><div data-line='3' class='line-number'></div><div data-line='4' class='line-number'></div><div data-line='5' class='line-number'></div><div data-line='6' class='line-number'></div></pre></td><td class='main  scheme'><pre><div class='line'><code><span class="p">(</span><span class="k">define </span><span class="p">(</span><span class="nf">sum_list</span> <span class="nv">list</span><span class="p">)</span>
</code></div><div class='line'><code>  <span class="p">(</span><span class="k">let </span><span class="nv">sum_sublist</span> <span class="p">((</span><span class="nf">accum</span> <span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="nf">sublist</span> <span class="nv">list</span><span class="p">))</span>  <span class="c1">; the named let!</span>
</code></div><div class='line'><code>    <span class="p">(</span><span class="k">cond </span><span class="p">((</span><span class="nb">null? </span><span class="nv">sublist</span><span class="p">)</span>
</code></div><div class='line'><code>           <span class="nv">accum</span> <span class="p">)</span>
</code></div><div class='line'><code>          <span class="p">(</span><span class="nf">else</span>
</code></div><div class='line'><code>           <span class="p">(</span><span class="nf">sum_sublist</span> <span class="p">(</span><span class="nb">+ </span><span class="nv">accum</span> <span class="p">(</span><span class="nb">car </span><span class="nv">sublist</span><span class="p">))</span> <span class="p">(</span><span class="nb">cdr </span><span class="nv">sublist</span><span class="p">))</span> <span class="p">))))</span>
</code></div></pre></td></tr></table></div></figure>


<p><a href="http://people.csail.mit.edu/jaffer/r5rs_6.html#IDX130"><strong>Named let</strong></a> creates a function and invokes it (with the provided initial values) in one step. It is decidedly my favorite control structure of all time. You can have your <code>while</code> loops and your <code>for</code> loops, and your <code>do</code>…<code>until</code> loops too<a href="#fn5" class="footnoteRef" id="fnref5"><sup>5</sup></a>. I’ll take named let any day, because it provides the abstraction barrier of recursion without compromising the conciseness and efficiency of iteration. In case you’re not sufficiently impressed, I discuss the delightful properties of using recursion instead of non-recursive loops <a href="#recursion">below</a>.</p>
<h2 id="assembly">Assembly</h2>
<p><a name="neatly-into-assembly"></a> Named let style translates amazingly naturally into assembly.</p>
<figure class='code'><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='1' class='line-number'></div><div data-line='2' class='line-number'></div><div data-line='3' class='line-number'></div><div data-line='4' class='line-number'></div><div data-line='5' class='line-number'></div><div data-line='6' class='line-number'></div><div data-line='7' class='line-number'></div><div data-line='8' class='line-number'></div><div data-line='9' class='line-number'></div><div data-line='10' class='line-number'></div><div data-line='11' class='line-number'></div><div data-line='12' class='line-number'></div><div data-line='13' class='line-number'></div><div data-line='14' class='line-number'></div><div data-line='15' class='line-number'></div><div data-line='16' class='line-number'></div><div data-line='17' class='line-number'></div><div data-line='18' class='line-number'></div></pre></td><td class='main  nasm'><pre><div class='line'><code><span class="k">bits</span> <span class="mi">64</span>
</code></div><div class='line'><code><span class="c1">; macros for readability</span>
</code></div><div class='line'><code><span class="cp">%define list rdi             </span><span class="c1">; by calling convention, argument shows up here</span>
</code></div><div class='line'><code><span class="cp">%define accum rax            </span><span class="c1">; accumulator (literally!)</span>
</code></div><div class='line'><code><span class="cp">%define sublist rdx</span>
</code></div><div class='line'><code> </code></div><div class='line'><code><span class="k">global</span> <span class="nv">sum_list</span>
</code></div><div class='line'><code><span class="nl">sum_list:</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nv">accum</span><span class="p">,</span> <span class="mi">0</span>               <span class="c1">; these are the let-bindings!</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nv">sublist</span><span class="p">,</span> <span class="nv">list</span>
</code></div><div class='line'><code><span class="nl">.sum_sublist:</span>
</code></div><div class='line'><code>  <span class="nf">test</span> <span class="nv">sublist</span><span class="p">,</span> <span class="nv">sublist</span>      <span class="c1">; is it NULL?</span>
</code></div><div class='line'><code>  <span class="nf">jnz</span> <span class="nv">.else</span>                  <span class="c1">; if not, goto else</span>
</code></div><div class='line'><code>  <span class="nf">ret</span><span class="c1">; accum                (because return value is rax by calling convention)</span>
</code></div><div class='line'><code><span class="nl">.else:</span>
</code></div><div class='line'><code>  <span class="nf">add</span> <span class="nv">accum</span><span class="p">,</span> <span class="p">[</span><span class="nv">sublist</span><span class="p">]</span>       <span class="c1">; ~ accum=accum+car(sublist);</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nv">sublist</span><span class="p">,</span> <span class="p">[</span><span class="nv">sublist</span><span class="o">+</span><span class="mi">8</span><span class="p">]</span>   <span class="c1">; ~ sublist=cdr(sublist);</span>
</code></div><div class='line'><code>  <span class="nf">jmp</span> <span class="nv">.sum_sublist</span>           <span class="c1">; tail-recurse</span>
</code></div></pre></td></tr></table></div></figure>


<blockquote>
<pre><code>&gt; sum_list(from(1,100))
5050
&gt; sum_list(from(1,10000000))
50000005000000</code></pre>
<p>(Sadly, my assembler doesn’t come with its own REPL; we’re borrowing the <a href="http://luajit.org">LuaJIT</a> REPL instead<a href="#fn6" class="footnoteRef" id="fnref6"><sup>6</sup></a>.)</p>
</blockquote>
<p>In fact, if I weren’t so comfortable with named let, I doubt I’d be an effective assembly coder, because assembly doesn’t really have any other iteration constructs<a href="#fn7" class="footnoteRef" id="fnref7"><sup>7</sup></a>. But I don’t miss them. <a href="#iteration">What would they look like, anyway?</a></p>
<hr />
<p>In the next installment of <strong>Python to Scheme to Assembly</strong>, we will look at <code>call-with-current-continuation</code>.</p>
<h2 id="addendum-c">Addendum: C</h2>
<p>In this addendum, we’re going to look at the assembly for iteration, non-tail recursion, and tail recursion, as emitted by <code>gcc</code>, and get to the bottom of what the difference is anyway.</p>
<p>At the top of each C file here, we have the following:</p>
<figure class='code'><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='1' class='line-number'></div><div data-line='2' class='line-number'></div><div data-line='3' class='line-number'></div><div data-line='4' class='line-number'></div><div data-line='5' class='line-number'></div></pre></td><td class='main  c'><pre><div class='line'><code><span class="cp">#include &lt;stdint.h&gt;</span>
</code></div><div class='line'><code><span class="k">struct</span> <span class="n">number_list</span> <span class="p">{</span>
</code></div><div class='line'><code>  <span class="kt">uint64_t</span> <span class="n">number</span><span class="p">;</span>
</code></div><div class='line'><code>  <span class="k">struct</span> <span class="n">number_list</span> <span class="o">&ast;</span><span class="n">next</span><span class="p">;</span>
</code></div><div class='line'><code><span class="p">};</span>
</code></div></pre></td></tr></table></div></figure>


<p><a name="iteration"></a></p>
<h3 id="iteration">Iteration</h3>
<p>If I were solving this problem in the context of a C program, this is how I would do it.</p>
<figure class='code'><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='6' class='line-number'></div><div data-line='7' class='line-number'></div><div data-line='8' class='line-number'></div><div data-line='9' class='line-number'></div><div data-line='10' class='line-number'></div><div data-line='11' class='line-number'></div><div data-line='12' class='line-number'></div><div data-line='13' class='line-number'></div></pre></td><td class='main  c'><pre><div class='line'><code><span class="kt">uint64_t</span> <span class="nf">sum_list</span><span class="p">(</span><span class="k">struct</span> <span class="n">number_list</span><span class="o">&ast;</span> <span class="n">list</span><span class="p">)</span> <span class="p">{</span>
</code></div><div class='line'><code>  <span class="kt">uint64_t</span> <span class="n">accum</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
</code></div><div class='line'><code>  <span class="k">while</span><span class="p">(</span><span class="n">list</span><span class="p">)</span> <span class="p">{</span>
</code></div><div class='line'><code>    <span class="n">accum</span><span class="o">+=</span><span class="n">list</span><span class="o">-&gt;</span><span class="n">number</span><span class="p">;</span>
</code></div><div class='line'><code>    <span class="n">list</span><span class="o">=</span><span class="n">list</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">;</span>
</code></div><div class='line'><code>  <span class="p">}</span>
</code></div><div class='line'><code>  <span class="k">return</span> <span class="n">accum</span><span class="p">;</span>
</code></div><div class='line'><code><span class="p">}</span>
</code></div></pre></td></tr></table></div></figure>


<p>Here’s the generated assembly, translated to <code>nasm</code> syntax and commented.</p>
<figure class='code'><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='1' class='line-number'></div><div data-line='2' class='line-number'></div><div data-line='3' class='line-number'></div><div data-line='4' class='line-number'></div><div data-line='5' class='line-number'></div><div data-line='6' class='line-number'></div><div data-line='7' class='line-number'></div><div data-line='8' class='line-number'></div><div data-line='9' class='line-number'></div><div data-line='10' class='line-number'></div><div data-line='11' class='line-number'></div><div data-line='12' class='line-number'></div><div data-line='13' class='line-number'></div></pre></td><td class='main  nasm'><pre><div class='line'><code><span class="k">global</span> <span class="nv">sum_list</span>
</code></div><div class='line'><code><span class="nl">sum_list:</span>
</code></div><div class='line'><code>  <span class="nf">xor</span> <span class="nb">eax</span><span class="p">,</span> <span class="nb">eax</span>     <span class="c1">; equivalent to &quot;mov rax, 0&quot; but faster</span>
</code></div><div class='line'><code>                   <span class="c1">; in C it&#39;s fine to clobber rdi instead of copying it first</span>
</code></div><div class='line'><code>  <span class="nf">test</span> <span class="nb">rdi</span><span class="p">,</span> <span class="nb">rdi</span>        <span class="c1">; &lt;- same as ours</span>
</code></div><div class='line'><code>  <span class="nf">jz</span> <span class="nv">done</span>          <span class="c1">; here the &quot;if NULL&quot; case is at the bottom</span>
</code></div><div class='line'><code><span class="nl">.else:</span>
</code></div><div class='line'><code>  <span class="nf">add</span> <span class="nb">rax</span><span class="p">,</span> <span class="p">[</span><span class="nb">rdi</span><span class="p">]</span>       <span class="c1">; &lt;- same as ours</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rdi</span><span class="p">,</span> <span class="p">[</span><span class="nb">rdi</span><span class="o">+</span><span class="mi">8</span><span class="p">]</span>     <span class="c1">; &lt;- same as ours</span>
</code></div><div class='line'><code>  <span class="nf">test</span> <span class="nb">rdi</span><span class="p">,</span> <span class="nb">rdi</span>        <span class="c1">; &lt;- same as ours, but duplicated</span>
</code></div><div class='line'><code>  <span class="nf">jnz</span> <span class="nv">.else</span>            <span class="c1">; &lt;- same as ours</span>
</code></div><div class='line'><code><span class="nl">.done:</span>
</code></div><div class='line'><code>  <span class="nf">rep</span> <span class="nv">ret</span>          <span class="c1">; equivalent to &quot;ret&quot;, but faster on old AMD chips for no good reason</span>
</code></div></pre></td></tr></table></div></figure>


<p>This is <em>almost</em> identical to the assembly that I wrote, except that it clobbers one of its inputs (which is perfectly allowed by the C calling convention<a href="#fn8" class="footnoteRef" id="fnref8"><sup>8</sup></a>), it uses <code>xor</code> instead of <code>mov</code> to load <code>0</code> (a solid optimization<a href="#fn9" class="footnoteRef" id="fnref9"><sup>9</sup></a>), it uses <code>rep ret</code> (less compact and no benefit on Intel chips), and it shuffles the instructions around such that two <code>test</code>s are needed (almost certainly not helpful with modern branch prediction and loop detection). I haven’t run benchmarks on this, but my guess is that it would come out about even. (Both versions are eight instructions long.) I also think the shuffling makes this “iterative” version more opaque and difficult to reason about (not least because of the duplicated <code>test</code>) than my “named let”-style code.</p>
<h3 id="non-tail-recursion">Non-Tail Recursion</h3>
<figure class='code'><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='6' class='line-number'></div><div data-line='7' class='line-number'></div><div data-line='8' class='line-number'></div><div data-line='9' class='line-number'></div><div data-line='10' class='line-number'></div><div data-line='11' class='line-number'></div><div data-line='12' class='line-number'></div></pre></td><td class='main  c'><pre><div class='line'><code><span class="kt">uint64_t</span> <span class="nf">sum_list</span><span class="p">(</span><span class="k">struct</span> <span class="n">number_list</span><span class="o">&ast;</span> <span class="n">list</span><span class="p">)</span> <span class="p">{</span>
</code></div><div class='line'><code>  <span class="k">if</span><span class="p">(</span><span class="o">!</span><span class="n">list</span><span class="p">)</span> <span class="p">{</span>
</code></div><div class='line'><code>    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
</code></div><div class='line'><code>  <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</code></div><div class='line'><code>    <span class="k">return</span> <span class="n">list</span><span class="o">-&gt;</span><span class="n">number</span><span class="o">+</span><span class="n">sum_list</span><span class="p">(</span><span class="n">list</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">);</span>
</code></div><div class='line'><code>  <span class="p">}</span>
</code></div><div class='line'><code><span class="p">}</span>
</code></div></pre></td></tr></table></div></figure>


<p><code>gcc -O3</code> can <em>almost</em> completely convert this version to iteration, so let’s look at the generated assembly from <code>gcc -O1</code> to get a better sense of what it might look like in a language implementation for which the necessary optimizations are too complex to be made automatically.</p>
<figure class='code'><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='1' class='line-number'></div><div data-line='2' class='line-number'></div><div data-line='3' class='line-number'></div><div data-line='4' class='line-number'></div><div data-line='5' class='line-number'></div><div data-line='6' class='line-number'></div><div data-line='7' class='line-number'></div><div data-line='8' class='line-number'></div><div data-line='9' class='line-number'></div><div data-line='10' class='line-number'></div><div data-line='11' class='line-number'></div><div data-line='12' class='line-number'></div><div data-line='13' class='line-number'></div></pre></td><td class='main  nasm'><pre><div class='line'><code><span class="k">global</span> <span class="nv">sum_list</span>
</code></div><div class='line'><code><span class="nl">sum_list:</span>
</code></div><div class='line'><code>  <span class="nf">push</span> <span class="nb">rbx</span>          <span class="c1">; preserve the current value of rbx on the stack</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rbx</span><span class="p">,</span> <span class="nb">rdi</span>      <span class="c1">; replace rbx by the argument to the function, list</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">eax</span><span class="p">,</span> <span class="mi">0</span>        <span class="c1">; set up 0 in the result register</span>
</code></div><div class='line'><code>  <span class="nf">test</span> <span class="nb">rdi</span><span class="p">,</span> <span class="nb">rdi</span>     <span class="c1">; check if rdi is NULL</span>
</code></div><div class='line'><code>  <span class="nf">jz</span> <span class="nv">.else</span>          <span class="c1">; if so go to else</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rdi</span><span class="p">,</span> <span class="p">[</span><span class="nb">rdi</span><span class="o">+</span><span class="mi">8</span><span class="p">]</span>  <span class="c1">; ~ list=list-&gt;next;</span>
</code></div><div class='line'><code>  <span class="nf">call</span> <span class="nv">sum_list</span>     <span class="c1">; sum_list(list) -&gt; result register (rax)</span>
</code></div><div class='line'><code>  <span class="nf">add</span> <span class="nb">rax</span><span class="p">,</span> <span class="p">[</span><span class="nb">rbx</span><span class="p">]</span>    <span class="c1">; add list-&gt;number (preserved across function call) to rax</span>
</code></div><div class='line'><code><span class="nl">.else:</span>
</code></div><div class='line'><code>  <span class="nf">pop</span> <span class="nb">rbx</span>           <span class="c1">; restore the state of rbx</span>
</code></div><div class='line'><code>  <span class="nf">ret</span>               <span class="c1">; return rax</span>
</code></div></pre></td></tr></table></div></figure>


<p>We can see immediately that some new instructions (<code>push</code>, <code>pop</code>, and <code>call</code>) have been introduced. These are all <strong>stack manipulation instructions</strong><a href="#fn10" class="footnoteRef" id="fnref10"><sup>10</sup></a>. If we carefully pretend to be the CPU running this program, we can see that it pushes the address of every number in the linked list, and then dereferences and adds them up as it pops them from the stack. This is not good; if we wanted our entire data structure to be replicated on the stack, we would have passed it by value<a href="#fn11" class="footnoteRef" id="fnref11"><sup>11</sup></a>! It’s generally the amount of memory set aside for the stack that we’ve actually run out of in the case of a <code>recursion depth exceeded</code> error.</p>
<h3 id="tail-recursion">Tail Recursion</h3>
<p>What about translating the tail-recursive version into C? Like Scheme and Python, <code>gcc</code> supports nested function definitions (as a GNU extension to C), so this is no problem:</p>
<figure class='code'><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='6' class='line-number'></div><div data-line='7' class='line-number'></div><div data-line='8' class='line-number'></div><div data-line='9' class='line-number'></div><div data-line='10' class='line-number'></div><div data-line='11' class='line-number'></div><div data-line='12' class='line-number'></div><div data-line='13' class='line-number'></div><div data-line='14' class='line-number'></div><div data-line='15' class='line-number'></div></pre></td><td class='main  c'><pre><div class='line'><code><span class="kt">uint64_t</span> <span class="nf">sum_list</span><span class="p">(</span><span class="k">struct</span> <span class="n">number_list</span><span class="o">&ast;</span> <span class="n">list</span><span class="p">)</span> <span class="p">{</span>
</code></div><div class='line'><code>  <span class="kt">uint64_t</span> <span class="n">sum_sublist</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">accum</span><span class="p">,</span> <span class="k">struct</span> <span class="n">number_list</span><span class="o">&ast;</span> <span class="n">sublist</span><span class="p">)</span> <span class="p">{</span>
</code></div><div class='line'><code>    <span class="k">if</span><span class="p">(</span><span class="o">!</span><span class="n">sublist</span><span class="p">)</span> <span class="p">{</span>
</code></div><div class='line'><code>      <span class="k">return</span> <span class="n">accum</span><span class="p">;</span>
</code></div><div class='line'><code>    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</code></div><div class='line'><code>      <span class="k">return</span> <span class="n">sum_sublist</span><span class="p">(</span><span class="n">accum</span><span class="o">+</span><span class="n">sublist</span><span class="o">-&gt;</span><span class="n">number</span><span class="p">,</span><span class="n">sublist</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">);</span>
</code></div><div class='line'><code>    <span class="p">}</span>
</code></div><div class='line'><code>  <span class="p">}</span>
</code></div><div class='line'><code>  <span class="k">return</span> <span class="n">sum_sublist</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="n">list</span><span class="p">);</span>
</code></div><div class='line'><code><span class="p">}</span>
</code></div></pre></td></tr></table></div></figure>


Here’s what <code>gcc -O1</code> gives us (translated and commented as before):
<figure class='code'><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='1' class='line-number'></div><div data-line='2' class='line-number'></div><div data-line='3' class='line-number'></div><div data-line='4' class='line-number'></div><div data-line='5' class='line-number'></div><div data-line='6' class='line-number'></div><div data-line='7' class='line-number'></div><div data-line='8' class='line-number'></div><div data-line='9' class='line-number'></div><div data-line='10' class='line-number'></div><div data-line='11' class='line-number'></div><div data-line='12' class='line-number'></div><div data-line='13' class='line-number'></div><div data-line='14' class='line-number'></div><div data-line='15' class='line-number'></div><div data-line='16' class='line-number'></div><div data-line='17' class='line-number'></div><div data-line='18' class='line-number'></div><div data-line='19' class='line-number'></div></pre></td><td class='main  nasm'><pre><div class='line'><code><span class="nl">sum_sublist.1867:</span>       <span class="c1">; A random constant has been added to avoid polluting the namespace. Not the best solution, but okay.</span>
</code></div><div class='line'><code>  <span class="nf">sub</span> <span class="nb">rsp</span><span class="p">,</span> <span class="mi">8</span>            <span class="c1">; Decrement the stack by one 8-byte machine word. Seems unnecessary...</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rax</span><span class="p">,</span> <span class="nb">rdi</span>          <span class="c1">; Copy first argument (rdi/&quot;accum&quot;) into result register (rax).</span>
</code></div><div class='line'><code>  <span class="nf">test</span> <span class="nb">rsi</span><span class="p">,</span> <span class="nb">rsi</span>         <span class="c1">; Test second argument (rsi/&quot;sublist&quot;) for nullity.</span>
</code></div><div class='line'><code>  <span class="nf">jz</span> <span class="nv">.else</span>              <span class="c1">; If null, goto else.</span>
</code></div><div class='line'><code>  <span class="nf">add</span> <span class="nb">rdi</span><span class="p">,</span> <span class="p">[</span><span class="nb">rsi</span><span class="p">]</span>        <span class="c1">; ~ accum = accum + sublist-&gt;number;</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rsi</span><span class="p">,</span> <span class="p">[</span><span class="nb">rsi</span><span class="o">+</span><span class="mi">8</span><span class="p">]</span>      <span class="c1">; sublist = sublist-&gt;next;</span>
</code></div><div class='line'><code>  <span class="nf">call</span> <span class="nv">sum_sublist.1867</span> <span class="c1">; recurse. result appears in rax, ready to pass along (as the return value) to the next caller in the stack.</span>
</code></div><div class='line'><code><span class="nl">.else:</span>
</code></div><div class='line'><code>  <span class="nf">add</span> <span class="nb">rsp</span><span class="p">,</span> <span class="mi">8</span>            <span class="c1">; seems unnecessary</span>
</code></div><div class='line'><code>  <span class="nf">ret</span>                   <span class="c1">; return rax</span>
</code></div><div class='line'><code> </code></div><div class='line'><code><span class="nl">sum_list:</span>
</code></div><div class='line'><code>  <span class="nf">sub</span> <span class="nb">rsp</span><span class="p">,</span> <span class="mi">8</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rsi</span><span class="p">,</span> <span class="nb">rdi</span>          <span class="c1">; first argument (rdi/&quot;list&quot;) of sum_list becomes 2nd argument (rsi/&quot;sublist&quot;) of sum_sublist</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rdi</span><span class="p">,</span> <span class="mi">0</span>            <span class="c1">; first argument (rdi/&quot;accum&quot;) of sum_sublist is 0</span>
</code></div><div class='line'><code>  <span class="nf">call</span> <span class="nv">sum_sublist.1867</span> <span class="c1">; call sum_sublist!</span>
</code></div><div class='line'><code>  <span class="nf">add</span> <span class="nb">rsp</span><span class="p">,</span> <span class="mi">8</span>
</code></div><div class='line'><code>  <span class="nf">ret</span>
</code></div></pre></td></tr></table></div></figure>


<p>In this mode, the tail <code>call</code> is not being eliminated – although we’re no longer <code>push</code>ing <code>rbx</code>, we’re still pushing <code>rip</code> to stack with every <code>call</code>, and eventually we’ll run out of stack that way. The only way to get around this is to replace each <code>call</code> with <code>jmp</code>: since we’re just going to take the return value of the next recursive invocation and then immediately <code>ret</code> back to the previous caller on the stack, there’s no point in even inserting our own address on the stack (as <code>call</code> does); we can just set up the next guy to pass the return value straight back to the previous guy, and quietly disappear.</p>
<p><code>gcc -O3</code> does this. In fact, somewhat surprisingly, it generates <em>exactly</em> the same assembly, line for line, for this version as for the purely iterative version above. That’s “<strong>tail call optimization</strong>” (TCO) or “<strong>tail recursion elimination</strong>” (TRE) in its most agressive form: it literally just gets rid of all calls and recursions and replaces them with an equivalent iteration (complete with duplicate <code>test</code>).</p>
<p>The upshot of all this is that <em>not only does Scheme’s “named let” recursion form translate <a href="#neatly-into-assembly">neatly into assembly</a>, it provides – penalty-free – a better abstraction than either iteration</em> (while-loop imitation) <em>or stack-driven recursion</em>, the two options <code>gcc</code> appears to pick from when dealing with various ways to code a list traversal.</p>
<p>Actually, the real point I’m trying to make here is that, <strong>unlike in C, I can naturally do named let directly in assembly, and that’s one of the many reasons working in assembly makes me happy</strong>.</p>
<p><a name="recursion"></a></p>
<h2 id="appendix-whats-so-great-about-recursion-anyway">Appendix: What’s so great about recursion, anyway?</h2>
<p>For me, the most important point in favor of a recursive representation of loops is that I find it easier to reason about <strong>correctness</strong> that way.</p>
<p>Any function we define ought to implement some ideal mathematical function that maps inputs to outputs<a href="#fn12" class="footnoteRef" id="fnref12"><sup>12</sup></a>. If our code truly does implement that ideal function, we say that the code is <strong>correct</strong>. Generally, we can break down the body of a function as a <a href="http://en.wikipedia.org/wiki/Function_composition">composition</a> of smaller functions; even in imperative languages, we can think of every statement as pulling in a state of the world, making well-defined changes, and passing the new state of the world into the next statement<a href="#fn13" class="footnoteRef" id="fnref13"><sup>13</sup></a>. At each step, we ask ourselves, “are the outputs of this function going to be what I want them to be?” For loops, though, <a href="http://en.wikipedia.org/wiki/Loop_invariant#Informal_example">this gets tricky</a>.</p>
<p>What recursion does for us as aspiring writers of correct functions is automatic translation of the loop verification problem into the much nicer problem of function verification. Intuitively, you can simply assume that all invocations of a recursive function within its own body are going to Do The Right Thing, ensure that the function as a whole Does The Right Thing under that assumption, and then conclude that the function Does The Right Thing in general. If this sounds like circular reasoning, it does<a href="#fn14" class="footnoteRef" id="fnref14"><sup>14</sup></a>; but it turns out to be valid anyway.</p>
<p>There are many ways to justify this procedure formally, all of which are truly mind-bending<a href="#fn15" class="footnoteRef" id="fnref15"><sup>15</sup></a>. But once you’ve justified this procedure once, you never have to do it again (unlike ad-hoc reasoning about loops). I’ve determined that the most elegant way to explain it is by expanding our <a href="#named-let-1">named let example</a> into a non-recursive function, which just happens to accept as a parameter a correct<a href="#fn16" class="footnoteRef" id="fnref16"><sup>16</sup></a> version of itself.</p>
<figure class='code'><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='1' class='line-number'></div><div data-line='2' class='line-number'></div><div data-line='3' class='line-number'></div><div data-line='4' class='line-number'></div><div data-line='5' class='line-number'></div><div data-line='6' class='line-number'></div><div data-line='7' class='line-number'></div></pre></td><td class='main  scheme'><pre><div class='line'><code><span class="p">(</span><span class="k">define </span><span class="p">(</span><span class="nf">sum_list</span> <span class="nv">list</span><span class="p">)</span>
</code></div><div class='line'><code>  <span class="p">(</span><span class="k">define </span><span class="p">(</span><span class="nf">sum_sublist_nonrec</span> <span class="nv">f_correct</span> <span class="nv">accum</span> <span class="nv">sublist</span><span class="p">)</span>
</code></div><div class='line'><code>    <span class="p">(</span><span class="k">cond </span><span class="p">((</span><span class="nb">null? </span><span class="nv">sublist</span><span class="p">)</span>
</code></div><div class='line'><code>           <span class="nv">accum</span> <span class="p">)</span>
</code></div><div class='line'><code>          <span class="p">(</span><span class="nf">else</span>
</code></div><div class='line'><code>           <span class="p">(</span><span class="nf">f_correct</span> <span class="nv">f_correct</span> <span class="p">(</span><span class="nb">+ </span><span class="nv">accum</span> <span class="p">(</span><span class="nb">car </span><span class="nv">sublist</span><span class="p">))</span> <span class="p">(</span><span class="nb">cdr </span><span class="nv">sublist</span><span class="p">))</span> <span class="p">)))</span>
</code></div><div class='line'><code>  <span class="p">(</span><span class="nf">sum_sublist_nonrec</span> <span class="nv">sum_sublist_nonrec</span> <span class="mi">0</span> <span class="nv">list</span><span class="p">)</span> <span class="p">)</span>
</code></div></pre></td></tr></table></div></figure>


<p>Now, <code>sum_sublist_nonrec</code> is an honest-to-goodness non-recursive function, and we can check that it is correct. Given a correct function <code>f_correct</code> (which takes as inputs a correct version of itself, a number, and a list, and correctly returns the sum of all the elements in the list plus the number), a number, and a list, does <code>sum_sublist_nonrec</code> correctly return the sum of all elements in the list plus the number? Why yes, it does. (Constructing a formal proof tree for this claim is left as an exercise for the self-punishing reader.) Note that since <code>f_correct</code> is assumed to already be correct, the correct version of it is still just <code>f_correct</code>, so we can safely pass it to itself without violating our assumptions or introducing new ones. So, <code>sum_sublist_nonrec</code> is correct.</p>
<p>Now let’s consider the correctness of <code>sum_list</code>. It’s supposed to add up all the numbers in <code>list</code>. What it actually does is to apply the (correct) function <code>sum_sublist_nonrec</code>, passing in a correct version of itself (check! it’s already correct), a number to add the sum of the list to (check! adding zero to the sum of the list won’t change it), and the list (check! that’s what we’re supposed to sum up).</p>
<p>We’ve just proved our program correct! The magic of named let is that it generates this clumsy form with a bunch of <code>f_correct</code>s from a compact and elegant form. In so doing, it lets us get away with much less formal reasoning while still having the confidence that it can be converted into something like what we just slogged through. Rest assured that no matter what you do with named let, no matter how complicated the construct you create, this “assume it does the right thing” technique still applies!</p>
<p>With one <em>tiny</em> caveat. We haven’t proved that the program <em>terminates</em>. If this technique proved termination, then we could just write</p>
<figure class='code'><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='1' class='line-number'></div><div data-line='2' class='line-number'></div><div data-line='3' class='line-number'></div></pre></td><td class='main  scheme'><pre><div class='line'><code><span class="p">(</span><span class="k">define </span><span class="p">(</span><span class="nf">do-the-right-thing</span> <span class="nv">x</span><span class="p">)</span>
</code></div><div class='line'><code>  <span class="p">(</span><span class="k">let </span><span class="nv">does-the-right-thing</span> <span class="p">((</span><span class="nf">x</span> <span class="nv">x</span><span class="p">))</span>
</code></div><div class='line'><code>    <span class="p">(</span><span class="nf">does-the-right-thing</span> <span class="nv">x</span><span class="p">)))</span>
</code></div></pre></td></tr></table></div></figure>


<p>and it would be totally correct, no matter what thing we want it to do.</p>
<p>Technically, everywhere I’ve said “correct”, what I mean is <strong>partially correct</strong>: <em>if</em> it terminates, <em>then</em> the output is correct. (Equivalently, it definitely <em>won’t</em> return something incorrect.) <code>do-the-right-thing</code> is, in fact, partially correct: it never returns at all, so it won’t give you any incorrect outputs!</p>
<p>Termination proofs of recursive functions can usually be handled by <a href="http://en.wikipedia.org/wiki/Structural_induction">structural induction</a> on possible inputs: you establish that it terminates for minimal elements (e.g. the empty list) and that termination for any non-minimal element is dependent only on termination for some set of smaller elements (e.g. the tail of the list). The structure that you need in order to think about termination this way is also much clearer with recursion than with iteration constructs.</p>
<section class="footnotes">
<hr />
<ol>
<li id="fn1"><p>If you doubt my ability to productively use assembly for more complicated toy problems, I direct you to my <a href="http://davidad.github.io/blog/2014/02/25/overkilling-the-8-queens-problem/">previous blog post</a>.<a href="#fnref1">↩</a></p></li>
<li id="fn2"><p><a href="http://en.wikipedia.org/wiki/Guido_van_Rossum">Guido van Rossum</a> is the author of Python, and the “Benevolent Dictator for Life” of its development process.<a href="#fnref2">↩</a></p></li>
<li id="fn3"><p>Unlike most language implementations, <code>guile</code> natively supports <a href="http://www.gnu.org/software/guile/manual/html_node/Integers.html#Integers">arbitrarily large integers</a>.<a href="#fnref3">↩</a></p></li>
<li id="fn4"><p>Although at least it’s not as inelegant as defining the helper function <em>outside</em> the body of the actual function, thereby polluting the global namespace. Take advantage of nested functions!<a href="#fnref4">↩</a></p></li>
<li id="fn5"><p>You can even keep your <code>for-each</code> loops, which are no substitute for <a href="https://www.gnu.org/software/guile/manual/html_node/SRFI_002d1-Fold-and-Map.html#SRFI_002d1-Fold-and-Map"><code>map</code> and <code>filter</code></a>.<a href="#fnref5">↩</a></p></li>
<li id="fn6"><p>If you’re curious how this works, click <a href="https://gist.github.com/davidad/9288924">here</a>. But I haven’t settled on an ASM REPL solution I’m happy with – this is just a one-off hack. A more legitimate ASM REPL may be the subject of a future blog post.<a href="#fnref6">↩</a></p></li>
<li id="fn7"><p>Except for <code>rep</code> prefixes, which can iterate certain single instructions. I think it’s fair to say those don’t really count.<a href="#fnref7">↩</a></p></li>
<li id="fn8"><p>I find calling conventions distasteful in general. The calling convention is like a shadow API (in fact, it’s often referred to as the ABI, for application binary interface) that nobody has any control over (except the people at AMD, Intel, and Microsoft who are in a position to decide on such things) and that applies to every function, every component on every computer everywhere. What if we let people define their ABI as part of their API? Would the world come crashing down? I doubt it. You can already cause quite a bit of trouble by misusing A<em>P</em>Is; really, both API and ABI usage ought to be formally verified, and as such ought to have much more room for flexibility than they do now. &lt;/soapbox&gt;<a href="#fnref8">↩</a></p></li>
<li id="fn9"><p>I would have applied this <code>xor</code> optimization too if I weren’t trying to literally translate Scheme code as an illustration.<a href="#fnref9">↩</a></p></li>
<li id="fn10"><p>“The stack” is not merely a region of memory managed by the OS (like “the heap”, its common counterpart). The stack is a hardware-accelerated mechanism deeply embedded in the CPU. There is a hardware register <code>rsp</code> (a.k.a. the stack pointer). A <code>push</code> instruction decrements <code>rsp</code> (usually by 8 at a time, in 64-bit mode, since pointers are expressed as numbers of 8-bit bytes, and 64/8=8) and then stores a value to <code>[rsp]</code>. A <code>pop</code> instruction retrieves a value from <code>[rsp]</code> and then increments <code>rsp</code>. A <code>call</code> instruction <code>push</code>es the current value of <code>rip</code> (a.k.a. the instruction pointer, or the program counter), and then executes an unconditional jump (<code>jmp</code>). Finally, a <code>ret</code> instruction <code>pop</code>s from the stack into <code>rip</code>, returning to wherever the matching <code>call</code> left off.<a href="#fnref10">↩</a></p></li>
<li id="fn11"><p>You may point out here that C doesn’t actually let you pass entire linked lists by value. Maybe that’s because it’s a <em>bad idea</em>.<a href="#fnref11">↩</a></p></li>
<li id="fn12">If your function cannot be fully specified by an abstract mapping from inputs to outputs, then it is <strong>nondeterministic</strong>, which is a fancy word for “unpredictable”: there must exist some circumstances under which you cannot predict the behavior of the function, even knowing every input. Intuitively, I’m sure you can see how unpredictable software is a nightmare to debug. Controlling nondeterminism is an active field of computer science research, which is not the subject of this article. However, I hope you are at least convinced that nondeterminism is something you should avoid if possible, and that therefore you should try to design every function in your program as a proper mathematical function.
<p>
<p>Note that I’m not talking about “purity” here – it’s fine for “outputs” to include side effects as of function exit, and for “inputs” to include states of the external world as of function entry. What’s important is that the state at function exit of anything the function modifies be uniquely determined by the state at function entry of anything that can affect its execution.<a href="#fnref12">↩</a></p></li>
<li id="fn13"><p>Unless we’re dealing with hairy scope issues like hoisting, in which case you should get rid of those first.<a href="#fnref13">↩</a></p></li>
<li id="fn14"><p>Pun intended. The sentence within which this footnote is referenced <em>isn’t</em> circular reasoning; it’s a tautology. Therefore, it’s an example of something that sounds like circular reasoning but is valid anyway. Of course, you shouldn’t take the existence of this cute example as evidence that the circular-sounding reasoning preceding it is not, in fact, circular. (That would be a fallacy of <a href="http://en.wikipedia.org/wiki/Inappropriate_generalization">inappropriate generalization</a>, which neither is nor sounds like circular reasoning.)<a href="#fnref14">↩</a></p></li>
<li id="fn15"><p>Trying to explain it for the purposes of this blog post – while making sure that I’m not missing something – took me over four hours.<a href="#fnref15">↩</a></p></li>
<li id="fn16"><p>Technically, I mean “partially correct”. This will be addressed in due time. Be patient, pedantic reader. This argument is hard enough to understand already.<a href="#fnref16">↩</a></p></li>
</ol>
</section>]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Overkilling the 8-queens problem]]></title>
    <link href="http://davidad.github.io/blog/2014/02/25/overkilling-the-8-queens-problem/"/>
    <updated>2014-02-25T10:52:57-05:00</updated>
    <id>http://davidad.github.io/blog/2014/02/25/overkilling-the-8-queens-problem</id>
    <content type="html"><![CDATA[<p>Last night, a fellow <a href="http://www.hackerschool.com">Hacker School</a>er challenged me to a running-time contest on the classic <a href="http://en.wikipedia.org/wiki/Eight_queens_puzzle">eight queens puzzle</a>. Naturally, I pulled up my trusty <a href="http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf">Intel® 64 manual</a> and got to work. It turned out to be even faster than I expected, churning out pretty-printed output in 15ms, which is totally dominated by the time it takes the terminal to display it (it takes only 2ms if redirected to a file).</p>
<p><strong>Update</strong>: Very slightly more scientific <a href="https://github.com/davidad/8queens/tree/%2Bc_comparison">testing</a>, spurred by curious <a href="https://news.ycombinator.com/item?id=7301913">Hacker News commenters</a>, indicates that, without pretty-printing and other overhead, the solving time is actually closer to 11.2µs – about a factor of 7 speedup over commenter <a href="https://news.ycombinator.com/item?id=7302005">bluecalm</a>’s <a href="https://github.com/davidad/8queens/blob/%2Bc_comparison/8q_C_bluecalm.c">C implementation</a>.</p>
<figure>
<img src="http://i.imgur.com/5dOH49e.png" alt="pretty-printed output" /><figcaption>pretty-printed output</figcaption>
</figure>
<!-- more -->
<p>(<a href="http://i.imgur.com/qjckCeo.png">Click here to see the full output.</a>)</p>
<h2 id="the-approach">The Approach</h2>
<p>My solution method is heavily inspired by <a href="http://www.cl.cam.ac.uk/~mr10/backtrk.pdf">this paper</a> (which, appropriately enough, concerns a beautifully insane programming language called MCPL, combining features from ML, C, and Prolog). This paper contributes two key insights about solving the 8-queens problem:</p>
<ul>
<li><p>Conceptually, we can model the solution space as the leaves of a tree, where each internal node of the tree corresponds to a partial board (with the number of queens equal to the tree depth), and each parent-child link represents adding another queen at the row number corresponding to the depth of the child. Since there can only be one queen per row in a correct solution, this tree is a superset of the actual solution set.</p></li>
<li><p>Instead of actually constructing the tree, we can simply keep track of the current traversal state. In particular, this means we keep track of the currently occupied columns, the occupied leftward going diagonals, and the occupied rightward going diagonals, as they intersect the current row. (Each of these three state variables is eight bits of information.) In addition, we can keep track of the past traversal history of each level using <del>a</del> the stack.</p></li>
</ul>
<p>If any of this is unclear, <a href="http://www.cl.cam.ac.uk/~mr10/backtrk.pdf">check out the paper</a>, which has a beautiful diagram that there is no need for me to attempt replicating.</p>
<h2 id="the-code">The Code</h2>
<p>I’m going to go through the first<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a> version of the code, which doesn’t produce the pretty boards but has most of the clever tricks. (Ironically, adding “pretty printing” made my code uglier. Maybe it’s just that I was up too late working on it.)</p>
<p>The heart of this algorithm is the sequence that updates the state variables as we move from one layer into the next. This whole program is small enough that it’s still practical to just set aside registers to represent most variables; in particular, <code>rdx</code> represents where it’s okay to place a queen at the current layer (e.g. it starts out as <code>0b11111111</code>), and <code>xmm1</code> (one of those fancy 128-bit registers that supports fancy new operations) stores the “occupied left diagonals”, “occupied right diagonals”, and “occupied columns” states, in that order (with “occupied columns” being the least significant word<a href="#fn2" class="footnoteRef" id="fnref2"><sup>2</sup></a>). <code>xmm2</code>, <code>xmm3</code>, and <code>xmm4</code> are just being used as scratch space. Finally, <code>xmm7</code> is a constant <code>0xff</code>.</p>
<h3 id="instruction-dictionary">Instruction Dictionary</h3>
<p>To spare you the effort of searching through the Intel® 64 manual yourself, here are brief descriptions of all the fancy instructions I’m about to use.</p>
<ul>
<li><code>vpsllw</code>: <strong>Vector/Packed Shift Left (Logical) Words</strong>. <em>Separately</em> shifts left every word of the second argument by the number of bits represented as the third argument, and store the result to the first argument.</li>
<li><code>vpsrlw</code>: <strong>Vector/Packed Shift Right (Logical) Words</strong>. <em>Separately</em> shifts right every word of the second argument by the number of bits represented as the third argument, and store the result to the first argument.</li>
<li><code>pblendw</code>: <strong>Packed Blend Words</strong>. Using the third argument as a mask, selectively copy words from the second argument to the first argument.</li>
<li><code>vpsrldq</code>: <strong>Vector/Packed Shift Right (Logical) Double Quadword</strong>. Shifts the entire second argument by the number of bytes specified in the third argument, and stores the result to the first argument.</li>
<li><code>por</code>: <strong>Parallel OR</strong>. Bitwise ORs the first and second argument and assigns the result to the first argument.</li>
<li><code>vpandn</code>: <strong>Vector/Parallel AND NOT</strong>. Inverts the second argument, ANDs the result with the third argument, and assigns the result of <em>that</em> to the first argument.</li>
<li><code>movq</code>: <strong>Move Quadword</strong>. The standard way to move data between <code>xmm</code> registers and normal registers.</li>
</ul>
Now, let’s take this a few lines at a time.
<figure class='code'><figcaption>
8queens.asm<a href='https://github.com/davidad/8queens/blob/1989666c45baa639f152dfc89c70635f7007d20b/8queens.asm#L25'>github</a>
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='25' class='line-number'></div><div data-line='26' class='line-number'></div><div data-line='27' class='line-number'></div><div data-line='28' class='line-number'></div></pre></td><td class='main  nasm'><pre><div class='line'><code>  <span class="nf">vpsllw</span> <span class="nv">xmm2</span><span class="p">,</span> <span class="nv">xmm1</span><span class="p">,</span> <span class="mi">1</span>      <span class="c1">; shift entire state to left, place in xmm2</span>
</code></div><div class='line'><code>  <span class="nf">vpsrlw</span> <span class="nv">xmm3</span><span class="p">,</span> <span class="nv">xmm1</span><span class="p">,</span> <span class="mi">1</span>      <span class="c1">; shift entire state to right, place in xmm3</span>
</code></div><div class='line'><code>  <span class="nf">pblendw</span> <span class="nv">xmm1</span><span class="p">,</span> <span class="nv">xmm2</span><span class="p">,</span> <span class="m">0b</span><span class="mi">100</span> <span class="c1">; only copy &quot;left-attacking&quot; word back from xmm2</span>
</code></div><div class='line'><code>  <span class="nf">pblendw</span> <span class="nv">xmm1</span><span class="p">,</span> <span class="nv">xmm3</span><span class="p">,</span> <span class="m">0b</span><span class="mi">010</span> <span class="c1">; only copy &quot;right-attacking&quot; word back from xmm3</span>
</code></div></pre></td></tr></table></div></figure>


<p>If you’re accustomed to C, you might think of this as functionally equivalent to something like <code>xmm1[2] &lt;&lt;= 1; xmm1[1] &gt;&gt;=1</code><a href="#fn3" class="footnoteRef" id="fnref3"><sup>3</sup></a>. We want the word in position 1 to shift right and the word in position 2 to shift left, while the word in position 0 (occupied columns) stays put.</p>
<figure class='code'><figcaption>
8queens.asm<a href='https://github.com/davidad/8queens/blob/1989666c45baa639f152dfc89c70635f7007d20b/8queens.asm#L29'>github</a>
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='29' class='line-number'></div><div data-line='30' class='line-number'></div><div data-line='31' class='line-number'></div><div data-line='32' class='line-number'></div></pre></td><td class='main  nasm'><pre><div class='line'><code>  <span class="nf">vpsrldq</span> <span class="nv">xmm2</span><span class="p">,</span> <span class="nv">xmm1</span><span class="p">,</span> <span class="mi">4</span>     <span class="c1">; shift state right 4 &ast;bytes&ast;, place in xmm2</span>
</code></div><div class='line'><code>  <span class="nf">vpsrldq</span> <span class="nv">xmm3</span><span class="p">,</span> <span class="nv">xmm1</span><span class="p">,</span> <span class="mi">2</span>     <span class="c1">; shift state right 2 bytes, place in xmm3</span>
</code></div><div class='line'><code>  <span class="nf">por</span> <span class="nv">xmm2</span><span class="p">,</span> <span class="nv">xmm3</span>            <span class="c1">; collect bitwise ors in xmm2</span>
</code></div><div class='line'><code>  <span class="nf">por</span> <span class="nv">xmm2</span><span class="p">,</span> <span class="nv">xmm1</span>
</code></div></pre></td></tr></table></div></figure>


<p>Now, we want to combine the information about which squares in the next layer are under attack. It doesn’t matter from which direction – we want to make sure not to put a queen there. So, we shift right 2 words (= 4 bytes) and right 1 word (= 2 bytes) and OR them all together (accumulating into a scratch register so we don’t clobber our state).</p>
<figure class='code'><figcaption>
8queens.asm<a href='https://github.com/davidad/8queens/blob/1989666c45baa639f152dfc89c70635f7007d20b/8queens.asm#L33'>github</a>
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='33' class='line-number'></div><div data-line='34' class='line-number'></div><div data-line='35' class='line-number'></div></pre></td><td class='main  nasm'><pre><div class='line'><code>  <span class="nf">vpandn</span> <span class="nv">xmm4</span><span class="p">,</span> <span class="nv">xmm2</span><span class="p">,</span> <span class="nv">xmm7</span>   <span class="c1">; invert and select low byte</span>
</code></div><div class='line'><code>  <span class="nf">movq</span> <span class="nb">rdx</span><span class="p">,</span> <span class="nv">xmm4</span>            <span class="c1">; place in rdx</span>
</code></div><div class='line'><code>  <span class="nf">jmp</span> <span class="nv">next_state</span>           <span class="c1">; now we&#39;re set up to iterate</span>
</code></div></pre></td></tr></table></div></figure>


<p>But that still contains some stuff in the upper bytes. We only want the lower byte. And we also want <code>1</code> bits where queens <em>should</em> be allowed, rather than where they’re under attack. We can solve both problems with one <code>vpandn</code> instruction, which will flip all the bits, but mask out everything except the first byte (since <code>xmm7</code>=<code>0xff</code>).</p>
<p>So, now that we’re iterating, what happens <em>next</em>?</p>
<h3 id="instruction-dictionary-1">Instruction Dictionary</h3>
<ul>
<li><code>bsf</code>: <strong>Bit Scan Forward</strong>. Finds the least significant <code>1</code> bit in the second argument and stores the index of that bit into the first argument. If there is no <code>1</code> bit the second argument, the value of the first argument is undefined, and the zero flag (<code>ZF</code>) is set.</li>
<li><code>btc</code>: <strong>Bit Clear</strong>. Clears the bit in the first argument with index given by the second argument.</li>
<li><code>je</code>: <strong>Jump If Equal</strong>. Pretty self-explanatory, when used in conjunction with <code>cmp</code> (<strong>Compare</strong>).</li>
<li><code>jz</code>: <strong>Jump If Zero</strong>. Jumps to the specified address/label if the zero flag (ZF) is set.</li>
<li><code>push</code>: <strong>Push To Stack</strong>. Stores its single argument to the memory location pointed by <code>rsp</code>, and decrements <code>rsp</code> (usually by eight at a time, i.e., <code>rsp &lt;- rsp-8</code>).</li>
<li><code>shl</code>: <strong>Logical Shift Left</strong> for non-<code>xmm</code> registers.</li>
</ul>
<figure class='code'><figcaption>
8queens.asm<a href='https://github.com/davidad/8queens/blob/1989666c45baa639f152dfc89c70635f7007d20b/8queens.asm#L12'>github</a>
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='12' class='line-number'></div><div data-line='13' class='line-number'></div><div data-line='14' class='line-number'></div><div data-line='15' class='line-number'></div><div data-line='16' class='line-number'></div><div data-line='17' class='line-number'></div><div data-line='18' class='line-number'></div><div data-line='19' class='line-number'></div><div data-line='20' class='line-number'></div><div data-line='21' class='line-number'></div><div data-line='22' class='line-number'></div><div data-line='23' class='line-number'></div><div data-line='24' class='line-number'></div></pre></td><td class='main  nasm'><pre><div class='line'><code><span class="nl">next_state:</span>
</code></div><div class='line'><code>  <span class="nf">bsf</span> <span class="nb">rcx</span><span class="p">,</span> <span class="nb">rdx</span>             <span class="c1">; find next available position in current level</span>
</code></div><div class='line'><code>  <span class="nf">jz</span> <span class="nv">backtrack</span>             <span class="c1">; if there is no available position, we must go back</span>
</code></div><div class='line'><code>  <span class="nf">btc</span> <span class="nb">rdx</span><span class="p">,</span> <span class="nb">rcx</span>             <span class="c1">; mark position as unavailable</span>
</code></div><div class='line'><code>  <span class="nf">cmp</span> <span class="nb">rsp</span><span class="p">,</span> <span class="nv">r14</span>             <span class="c1">; check if we&#39;ve done 7 levels already</span>
</code></div><div class='line'><code>  <span class="nf">je</span> <span class="nv">win</span>                   <span class="c1">; if so, we have a win state. otherwise continue</span>
</code></div><div class='line'><code>  <span class="nf">movq</span> <span class="nv">r10</span><span class="p">,</span> <span class="nv">xmm1</span>           <span class="c1">; save current state ...</span>
</code></div><div class='line'><code>  <span class="nf">push</span> <span class="nb">rdx</span>
</code></div><div class='line'><code>  <span class="nf">push</span> <span class="nv">r10</span>                 <span class="c1">;   ... to stack</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rax</span><span class="p">,</span> <span class="nv">r15</span>             <span class="c1">; set up attack mask</span>
</code></div><div class='line'><code>  <span class="nf">shl</span> <span class="nb">rax</span><span class="p">,</span> <span class="nb">cl</span>              <span class="c1">; shift into position</span>
</code></div><div class='line'><code>  <span class="nf">movq</span> <span class="nv">xmm2</span><span class="p">,</span> <span class="nb">rax</span>
</code></div><div class='line'><code>  <span class="nf">por</span> <span class="nv">xmm1</span><span class="p">,</span> <span class="nv">xmm2</span>           <span class="c1">; mark as attacking in all directions</span>
</code></div></pre></td></tr></table></div></figure>


<p>First we try scanning for an available position on this row – one that isn’t under attack from already-placed queens, and that also hasn’t already been visited. If there is none, then we have no choice but to <code>backtrack</code> (a little piece of code which is coming up soon). Assuming we find an available position, we first mark it as visited/unavailable. We then check if this is the last level that needs to be taken care of, by looking at the stack pointer. Since the stack gets deeper by 16 bytes with every level, this test<a href="#fn4" class="footnoteRef" id="fnref4"><sup>4</sup></a> is easily set up at program initialization. If the test is true, then we’ve discovered a solution, or “win state” – so we go ahead to the “win” code.</p>
<p>If we’ve neither succeeded nor failed, it means we just have to go another level down in the tree. In order to have an efficient backtracking capability, we store our state variables on the stack, so they can be restored when everything fails deeper down in the tree. Finally, we update our model of which squares are in danger by adding the queen we’re currently placing as a column-occupier and diagonal-occupier (modifying all three state variables at once with the magic of <code>por</code>). Note here that <code>cl</code> is just a name for the least significant byte of the <code>rcx</code> register, which houses the horizontal position of the new queen.</p>
What if we have to backtrack?
<figure class='code'><figcaption>
8queens.asm<a href='https://github.com/davidad/8queens/blob/1989666c45baa639f152dfc89c70635f7007d20b/8queens.asm#L37'>github</a>
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='37' class='line-number'></div><div data-line='38' class='line-number'></div><div data-line='39' class='line-number'></div><div data-line='40' class='line-number'></div><div data-line='41' class='line-number'></div><div data-line='42' class='line-number'></div><div data-line='43' class='line-number'></div></pre></td><td class='main  nasm'><pre><div class='line'><code><span class="nl">backtrack:</span>
</code></div><div class='line'><code>  <span class="nf">cmp</span> <span class="nb">rsp</span><span class="p">,</span> <span class="nv">r13</span>             <span class="c1">; are we done?</span>
</code></div><div class='line'><code>  <span class="nf">je</span> <span class="nv">done</span>
</code></div><div class='line'><code>  <span class="nf">pop</span> <span class="nv">r10</span>                  <span class="c1">; restore last state</span>
</code></div><div class='line'><code>  <span class="nf">pop</span> <span class="nb">rdx</span>
</code></div><div class='line'><code>  <span class="nf">movq</span> <span class="nv">xmm1</span><span class="p">,</span> <span class="nv">r10</span>
</code></div><div class='line'><code>  <span class="nf">jmp</span> <span class="nv">next_state</span>           <span class="c1">; try again</span>
</code></div></pre></td></tr></table></div></figure>


<p>First, we have another stack-pointer test - if we’ve tried to backtrack past the start of the program, then we know we’ve exhausted all possibilities and just go to <code>done</code>. Assuming that’s not at issue, we simply restore the <code>rdx</code> and <code>xmm1</code> variables (using <code>r10</code> as scratch storage since one can’t directly pop <code>xmm</code> registers). Then we just jump back into our loop, with a new state ready to go!</p>
Now we’re ready to look at the whole solution in context:
<figure class='code'><figcaption>
8queens.asm<a href='https://github.com/davidad/8queens/blob/1989666c45baa639f152dfc89c70635f7007d20b/8queens.asm'>github</a>
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='1' class='line-number'></div><div data-line='2' class='line-number'></div><div data-line='3' class='line-number'></div><div data-line='4' class='line-number'></div><div data-line='5' class='line-number'></div><div data-line='6' class='line-number'></div><div data-line='7' class='line-number'></div><div data-line='8' class='line-number'></div><div data-line='9' class='line-number'></div><div data-line='10' class='line-number'></div><div data-line='11' class='line-number'></div><div data-line='12' class='line-number'></div><div data-line='13' class='line-number'></div><div data-line='14' class='line-number'></div><div data-line='15' class='line-number'></div><div data-line='16' class='line-number'></div><div data-line='17' class='line-number'></div><div data-line='18' class='line-number'></div><div data-line='19' class='line-number'></div><div data-line='20' class='line-number'></div><div data-line='21' class='line-number'></div><div data-line='22' class='line-number'></div><div data-line='23' class='line-number'></div><div data-line='24' class='line-number'></div><div data-line='25' class='line-number'></div><div data-line='26' class='line-number'></div><div data-line='27' class='line-number'></div><div data-line='28' class='line-number'></div><div data-line='29' class='line-number'></div><div data-line='30' class='line-number'></div><div data-line='31' class='line-number'></div><div data-line='32' class='line-number'></div><div data-line='33' class='line-number'></div><div data-line='34' class='line-number'></div><div data-line='35' class='line-number'></div><div data-line='36' class='line-number'></div><div data-line='37' class='line-number'></div><div data-line='38' class='line-number'></div><div data-line='39' class='line-number'></div><div data-line='40' class='line-number'></div><div data-line='41' class='line-number'></div><div data-line='42' class='line-number'></div><div data-line='43' class='line-number'></div><div data-line='44' class='line-number'></div><div data-line='45' class='line-number'></div><div data-line='46' class='line-number'></div><div data-line='47' class='line-number'></div><div data-line='48' class='line-number'></div><div data-line='49' class='line-number'></div><div data-line='50' class='line-number'></div><div data-line='51' class='line-number'></div><div data-line='52' class='line-number'></div></pre></td><td class='main  nasm'><pre><div class='line'><code><span class="cp">%include &quot;os_dependent_stuff.asm&quot;</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rdx</span><span class="p">,</span> <span class="m">0b</span><span class="mi">11111111</span>      <span class="c1">; all eight possibilities available</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nv">r8</span><span class="p">,</span> <span class="mh">0x000000000000</span>   <span class="c1">; no squares under attack from anywhere</span>
</code></div><div class='line'><code>  <span class="nf">movq</span> <span class="nv">xmm1</span><span class="p">,</span> <span class="nv">r8</span>            <span class="c1">; maintain this state in xmm1</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nv">r15</span><span class="p">,</span> <span class="mh">0x000100010001</span>  <span class="c1">; attack mask for one queen (left, right, and center)</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nv">r14</span><span class="p">,</span> <span class="mh">0xff</span>            <span class="c1">; mask for low byte</span>
</code></div><div class='line'><code>  <span class="nf">movq</span> <span class="nv">xmm7</span><span class="p">,</span> <span class="nv">r14</span>           <span class="c1">; stored in xmm register</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nv">r13</span><span class="p">,</span> <span class="nb">rsp</span>             <span class="c1">; current stack pointer (if we backtrack here, then</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nv">r14</span><span class="p">,</span> <span class="nb">rsp</span>             <span class="c1">;   the entire solution space has been explored)</span>
</code></div><div class='line'><code>  <span class="nf">sub</span> <span class="nv">r14</span><span class="p">,</span> <span class="mi">2</span><span class="o">&ast;</span><span class="mi">8</span><span class="o">&ast;</span><span class="mi">7</span>           <span class="c1">; this is where the stack pointer would be when we&#39;ve</span>
</code></div><div class='line'><code>                           <span class="c1">;   completed a winning state</span>
</code></div><div class='line'><code><span class="nl">next_state:</span>
</code></div><div class='line'><code>  <span class="nf">bsf</span> <span class="nb">rcx</span><span class="p">,</span> <span class="nb">rdx</span>             <span class="c1">; find next available position in current level</span>
</code></div><div class='line'><code>  <span class="nf">jz</span> <span class="nv">backtrack</span>             <span class="c1">; if there is no available position, we must go back</span>
</code></div><div class='line'><code>  <span class="nf">btc</span> <span class="nb">rdx</span><span class="p">,</span> <span class="nb">rcx</span>             <span class="c1">; mark position as unavailable</span>
</code></div><div class='line'><code>  <span class="nf">cmp</span> <span class="nb">rsp</span><span class="p">,</span> <span class="nv">r14</span>             <span class="c1">; check if we&#39;ve done 7 levels already</span>
</code></div><div class='line'><code>  <span class="nf">je</span> <span class="nv">win</span>                   <span class="c1">; if so, we have a win state. otherwise continue</span>
</code></div><div class='line'><code>  <span class="nf">movq</span> <span class="nv">r10</span><span class="p">,</span> <span class="nv">xmm1</span>           <span class="c1">; save current state ...</span>
</code></div><div class='line'><code>  <span class="nf">push</span> <span class="nb">rdx</span>
</code></div><div class='line'><code>  <span class="nf">push</span> <span class="nv">r10</span>                 <span class="c1">;   ... to stack</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rax</span><span class="p">,</span> <span class="nv">r15</span>             <span class="c1">; set up attack mask</span>
</code></div><div class='line'><code>  <span class="nf">shl</span> <span class="nb">rax</span><span class="p">,</span> <span class="nb">cl</span>              <span class="c1">; shift into position</span>
</code></div><div class='line'><code>  <span class="nf">movq</span> <span class="nv">xmm2</span><span class="p">,</span> <span class="nb">rax</span>
</code></div><div class='line'><code>  <span class="nf">por</span> <span class="nv">xmm1</span><span class="p">,</span> <span class="nv">xmm2</span>           <span class="c1">; mark as attacking in all directions</span>
</code></div><div class='line'><code>  <span class="nf">vpsllw</span> <span class="nv">xmm2</span><span class="p">,</span> <span class="nv">xmm1</span><span class="p">,</span> <span class="mi">1</span>      <span class="c1">; shift entire state to left, place in xmm2</span>
</code></div><div class='line'><code>  <span class="nf">vpsrlw</span> <span class="nv">xmm3</span><span class="p">,</span> <span class="nv">xmm1</span><span class="p">,</span> <span class="mi">1</span>      <span class="c1">; shift entire state to right, place in xmm3</span>
</code></div><div class='line'><code>  <span class="nf">pblendw</span> <span class="nv">xmm1</span><span class="p">,</span> <span class="nv">xmm2</span><span class="p">,</span> <span class="m">0b</span><span class="mi">100</span> <span class="c1">; only copy &quot;left-attacking&quot; word back from xmm2</span>
</code></div><div class='line'><code>  <span class="nf">pblendw</span> <span class="nv">xmm1</span><span class="p">,</span> <span class="nv">xmm3</span><span class="p">,</span> <span class="m">0b</span><span class="mi">010</span> <span class="c1">; only copy &quot;right-attacking&quot; word back from xmm3</span>
</code></div><div class='line'><code>  <span class="nf">vpsrldq</span> <span class="nv">xmm2</span><span class="p">,</span> <span class="nv">xmm1</span><span class="p">,</span> <span class="mi">4</span>     <span class="c1">; shift state right 4 &ast;bytes&ast;, place in xmm2</span>
</code></div><div class='line'><code>  <span class="nf">vpsrldq</span> <span class="nv">xmm3</span><span class="p">,</span> <span class="nv">xmm1</span><span class="p">,</span> <span class="mi">2</span>     <span class="c1">; shift state right 2 bytes, place in xmm3</span>
</code></div><div class='line'><code>  <span class="nf">por</span> <span class="nv">xmm2</span><span class="p">,</span> <span class="nv">xmm3</span>            <span class="c1">; collect bitwise ors in xmm2</span>
</code></div><div class='line'><code>  <span class="nf">por</span> <span class="nv">xmm2</span><span class="p">,</span> <span class="nv">xmm1</span>
</code></div><div class='line'><code>  <span class="nf">vpandn</span> <span class="nv">xmm4</span><span class="p">,</span> <span class="nv">xmm2</span><span class="p">,</span> <span class="nv">xmm7</span>   <span class="c1">; invert and select low byte</span>
</code></div><div class='line'><code>  <span class="nf">movq</span> <span class="nb">rdx</span><span class="p">,</span> <span class="nv">xmm4</span>            <span class="c1">; place in rdx</span>
</code></div><div class='line'><code>  <span class="nf">jmp</span> <span class="nv">next_state</span>           <span class="c1">; now we&#39;re set up to iterate</span>
</code></div><div class='line'><code> </code></div><div class='line'><code><span class="nl">backtrack:</span>
</code></div><div class='line'><code>  <span class="nf">cmp</span> <span class="nb">rsp</span><span class="p">,</span> <span class="nv">r13</span>             <span class="c1">; are we done?</span>
</code></div><div class='line'><code>  <span class="nf">je</span> <span class="nv">done</span>
</code></div><div class='line'><code>  <span class="nf">pop</span> <span class="nv">r10</span>                  <span class="c1">; restore last state</span>
</code></div><div class='line'><code>  <span class="nf">pop</span> <span class="nb">rdx</span>
</code></div><div class='line'><code>  <span class="nf">movq</span> <span class="nv">xmm1</span><span class="p">,</span> <span class="nv">r10</span>
</code></div><div class='line'><code>  <span class="nf">jmp</span> <span class="nv">next_state</span>           <span class="c1">; try again</span>
</code></div><div class='line'><code> </code></div><div class='line'><code><span class="nl">win:</span>
</code></div><div class='line'><code>  <span class="nf">inc</span> <span class="nv">r8</span>                   <span class="c1">; increment solution counter</span>
</code></div><div class='line'><code>  <span class="nf">jmp</span> <span class="nv">next_state</span>           <span class="c1">; keep going</span>
</code></div><div class='line'><code> </code></div><div class='line'><code><span class="nl">done:</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rdi</span><span class="p">,</span> <span class="nv">r8</span>              <span class="c1">; set system call argument to solution count</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rax</span><span class="p">,</span> <span class="nv">SYSCALL_EXIT</span>    <span class="c1">; set system call to exit</span>
</code></div><div class='line'><code>  <span class="nf">syscall</span>                  <span class="c1">; this will exit with our solution count as status</span>
</code></div></pre></td></tr></table></div></figure>


<p>If you’re curious to investigate further, <a href="https://github.com/davidad/8queens">run the code yourself</a><a href="#fn5" class="footnoteRef" id="fnref5"><sup>5</sup></a> and/or check out the <a href="https://github.com/davidad/8queens/blob/master/8queens.asm">more complicated, pretty-printing version</a>.</p>
<section class="footnotes">
<hr />
<ol>
<li id="fn1"><p>Somewhat surprisingly, the first version actually <em>worked</em>.<a href="#fnref1">↩</a></p></li>
<li id="fn2"><p>A word is two bytes. Why did I use words and not just bytes? The answer is that some of the fancy instructions we want to use don’t allow us to work with data elements any smaller than words.<a href="#fnref2">↩</a></p></li>
<li id="fn3"><p>But it’s all taking place in the register file – no memory accesses here!<a href="#fnref3">↩</a></p></li>
<li id="fn4"><p>That is to say, the value of <code>r14</code>.<a href="#fnref4">↩</a></p></li>
<li id="fn5"><p>Requires a recent (Sandy Bridge or later) Intel CPU.<a href="#fnref5">↩</a></p></li>
</ol>
</section>]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Relocatable vs. Position-Independent Code (or, Virtual Memory isn't Just For Swap)]]></title>
    <link href="http://davidad.github.io/blog/2014/02/19/relocatable-vs-position-independent-code-or/"/>
    <updated>2014-02-19T17:12:50-05:00</updated>
    <id>http://davidad.github.io/blog/2014/02/19/relocatable-vs-position-independent-code-or</id>
    <content type="html"><![CDATA[<blockquote>
<p><strong>Myth</strong>: “Virtual memory” is the mechanism that a kernel uses to make more memory available than is actually physically installed, by setting aside a disk partition for the overflow and copying pages between memory and disk as needed.</p>
</blockquote>
<p>I acquired this belief very early in my programming career, but it turns out that swapping pages to disk is merely one of the many things that “virtual memory” makes possible.</p>
<blockquote>
<p><strong>Fact</strong>: “Virtual memory” is a <em>hardware</em> (CPU) mechanism, which, every single time memory is accessed, references a kernel-specified data structure called a “page table” to arbitrarily <a href="http://www.catb.org/jargon/html/F/frobnicate.html">frobnicate</a> the high bits of the address, which is called “translating” from a “linear address” to a “physical address”. (The page table gets cached by a <a href="http://en.wikipedia.org/wiki/Translation_lookaside_buffer">translation lookaside buffer</a>, so the lookup is usually quite efficient!)</p>
</blockquote>
<p>This fact became very real to me this week as I made a <a href="http://davidad.github.io/blog/2014/02/18/kernel-from-scratch/">kernel from scratch</a>: I was moderately surprised that I <em>needed</em> to set up a page table, when I had always thought of virtual memory as a somewhat advanced kernel feature. Today, I learned how “relocatable” and “PIC” – terms I’d encountered in the past and never really understood – suddenly make sense in this context. <!-- more --></p>
<p>Here’s another fact that surprised me: in conventional operating systems, <strong>every process has its own page table</strong>. The pointer <code>0x7fff8000</code> does not necessarily translate to the same physical address in one process as it does in another<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a>.</p>
<p>Now, let’s talk about libraries. Libraries are code, but they don’t run as processes of their own. They’re going to wind up under someone else’s page table. There’s two ways that can happen: static linking and dynamic linking<a href="#fn2" class="footnoteRef" id="fnref2"><sup>2</sup></a>.</p>
<ul>
<li>If a library is statically linked, the linker finds some place in a code segment of the executable to situate the library. The loader will then place this segment in virtual memory (wherever it’s explicitly specified to go) when the executable is run.</li>
<li>If a library is dynamically linked, then when the loader sets up the executable, it will invoke the dynamic linker to make sure that the required library shows up some place in the process’s virtual memory<a href="#fn3" class="footnoteRef" id="fnref3"><sup>3</sup></a>.</li>
</ul>
<p>Whether static or dynamic, a linked library is going to be situated in virtual memory somewhere that the library can’t predict<a href="#fn4" class="footnoteRef" id="fnref4"><sup>4</sup></a>, which is problematic for accessing its own memory. Fortunately, the linker (whether static or dynamic) can help us out by <strong>relocating</strong> the library’s code, so that it knows where it is. Unfortunately, library writers have to help the linker out by specifying, in the object file, the set of instructions or initialized data that need to be modified to properly relocate it. As long as all that “relocation information” is present, the object file is said to be <strong>relocatable</strong>.</p>
<p>On the other hand, <strong>position-independent code</strong> (<strong>PIC</strong>), as the name suggests, doesn’t even need to be relocated. None of its instructions or initialized data encode any assumptions about the region of virtual memory the program will be loaded into; it figures out where it is (usually by referencing the instruction pointer) and makes all memory accesses based on what it finds out.</p>
<p>So why do all that work when the linker can relocate for us?</p>
<p>Here’s the kicker. The whole motivation for dynamic linking was <strong>shared libraries</strong>. <strong>Shared</strong> doesn’t just mean that multiple programs reference the same library file on disk. It means those processes share that library <strong>in physical memory</strong><a href="#fn5" class="footnoteRef" id="fnref5"><sup>5</sup></a>. Since every process has its own page table, the exact same library code winds up executing as if it were loaded into multiple, inconsistent virtual memory locations. If we relocated it for one process, it wouldn’t necessarily be valid for another. <strong>This is why weird things sometimes happen where the solution is “recompile <code>blah</code> with <code>-fPIC</code>”</strong>.</p>
<hr />
<p>Perhaps the most interesting thing about all this is that in today’s 64-bit age, position-independent code may not even be necessary. The available virtual memory address space with 64 bits is so large that an OS may be able to afford blocking off a region of <em>every</em> process’s virtual memory space to host <em>every</em> shared library on the system, so that their linear locations are guaranteed to be consistent from process to process. That means shared libraries would still have to be relocatable, but they wouldn’t have to be PIC.</p>
<p>On the other hand, x86_64 makes it <a href="http://eli.thegreenplace.net/2011/11/11/position-independent-code-pic-in-shared-libraries-on-x64/">significantly easier</a> to write position-independent code, by referring addresses to the current program counter (so no matter what virtual memory offset the code is at, it’s internally consistent). If we adopt a policy that <em>all</em> libraries (static and dynamic) are PIC, then libraries don’t ever have to worry about being relocated and the linker gets a lot simpler.</p>
<section class="footnotes">
<hr />
<ol>
<li id="fn1"><p>This is one of the things that differentiates a “process” from a “thread”: <strong>threads</strong> don’t have their own page tables.<a href="#fnref1">↩</a></p></li>
<li id="fn2"><p>Just as with static typechecking and dynamic typechecking, “static” means that it happens before the program is invoked, and “dynamic” means that happens after the program is invoked.<a href="#fnref2">↩</a></p></li>
<li id="fn3"><p>The loader also needs to populate a series of “slots” at fixed addresses with instructions that jump into where the library is (since the executable won’t know in advance where the library will show up, unlike with static linking). But that part of dynamic linking is a distraction for the discussion of relocatable vs. PIC.<a href="#fnref3">↩</a></p></li>
<li id="fn4"><p>unlike a stand-alone executable, which can request (almost) any virtual memory address that it wants (since it has the whole page table to itself)<a href="#fnref4">↩</a></p></li>
<li id="fn5"><p>In fact, in most operating systems, if multiple processes map the same file into their virtual memory, and none of them write to it, those processes’ page tables will translate each of their process-specific addresses for that file to the <strong>same pages</strong> of physical memory.<a href="#fnref5">↩</a></p></li>
</ol>
</section>]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Kernel from Scratch]]></title>
    <link href="http://davidad.github.io/blog/2014/02/18/kernel-from-scratch/"/>
    <updated>2014-02-18T02:58:08-05:00</updated>
    <id>http://davidad.github.io/blog/2014/02/18/kernel-from-scratch</id>
    <content type="html"><![CDATA[<p>One of my 3 major goals for Hacker School was to create a bootable, 64-bit kernel image from scratch, using only <code>nasm</code> and my text editor. Well, folks, one down, two to go.</p>
<p><img src="http://i.imgur.com/vnYFaFZ.png" alt="Hello, kernel!" /> <!-- more --></p>
<p>The NASM/x64 assembly code is listed below, with copious comments for your pleasure. It comprises 136 lines including comments; 75 with comments removed. You may wish to refer to the <a href="http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf">Intel® 64 Software Developers’ Manual (16.5MB PDF)</a>, especially if you’re interested in doing something similar yourself. Building and running is as simple as</p>
<pre><code>$ nasm boot.asm -o bootable.bin
$ qemu-system-x86_64 bootable.bin</code></pre>
<p>That is, assuming that you have recent versions of <a href="http://www.nasm.us/pub/nasm/releasebuilds/2.11/macosx/">nasm</a> and <a href="http://wiki.qemu.org/Download">qemu</a> installed.</p>
<p>Let’s get to the code!</p>
<figure class='code'><figcaption>
boot.asm<a href='https://raw.github.com/davidad/mesh/ffbd5935d7218eddbaa43b991d1eaa4e277ecf86/boot.asm'>raw</a>
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='1' class='line-number'></div><div data-line='2' class='line-number'></div><div data-line='3' class='line-number'></div><div data-line='4' class='line-number'></div><div data-line='5' class='line-number'></div><div data-line='6' class='line-number'></div><div data-line='7' class='line-number'></div><div data-line='8' class='line-number'></div><div data-line='9' class='line-number'></div><div data-line='10' class='line-number'></div><div data-line='11' class='line-number'></div><div data-line='12' class='line-number'></div><div data-line='13' class='line-number'></div><div data-line='14' class='line-number'></div><div data-line='15' class='line-number'></div><div data-line='16' class='line-number'></div><div data-line='17' class='line-number'></div><div data-line='18' class='line-number'></div><div data-line='19' class='line-number'></div><div data-line='20' class='line-number'></div><div data-line='21' class='line-number'></div><div data-line='22' class='line-number'></div><div data-line='23' class='line-number'></div><div data-line='24' class='line-number'></div><div data-line='25' class='line-number'></div><div data-line='26' class='line-number'></div><div data-line='27' class='line-number'></div><div data-line='28' class='line-number'></div><div data-line='29' class='line-number'></div><div data-line='30' class='line-number'></div><div data-line='31' class='line-number'></div><div data-line='32' class='line-number'></div><div data-line='33' class='line-number'></div><div data-line='34' class='line-number'></div><div data-line='35' class='line-number'></div><div data-line='36' class='line-number'></div><div data-line='37' class='line-number'></div><div data-line='38' class='line-number'></div><div data-line='39' class='line-number'></div><div data-line='40' class='line-number'></div><div data-line='41' class='line-number'></div><div data-line='42' class='line-number'></div><div data-line='43' class='line-number'></div><div data-line='44' class='line-number'></div><div data-line='45' class='line-number'></div><div data-line='46' class='line-number'></div><div data-line='47' class='line-number'></div><div data-line='48' class='line-number'></div><div data-line='49' class='line-number'></div><div data-line='50' class='line-number'></div><div data-line='51' class='line-number'></div><div data-line='52' class='line-number'></div><div data-line='53' class='line-number'></div><div data-line='54' class='line-number'></div><div data-line='55' class='line-number'></div><div data-line='56' class='line-number'></div><div data-line='57' class='line-number'></div><div data-line='58' class='line-number'></div><div data-line='59' class='line-number'></div><div data-line='60' class='line-number'></div><div data-line='61' class='line-number'></div><div data-line='62' class='line-number'></div><div data-line='63' class='line-number'></div><div data-line='64' class='line-number'></div><div data-line='65' class='line-number'></div><div data-line='66' class='line-number'></div><div data-line='67' class='line-number'></div><div data-line='68' class='line-number'></div><div data-line='69' class='line-number'></div><div data-line='70' class='line-number'></div><div data-line='71' class='line-number'></div><div data-line='72' class='line-number'></div><div data-line='73' class='line-number'></div><div data-line='74' class='line-number'></div><div data-line='75' class='line-number'></div><div data-line='76' class='line-number'></div><div data-line='77' class='line-number'></div><div data-line='78' class='line-number'></div><div data-line='79' class='line-number'></div><div data-line='80' class='line-number'></div><div data-line='81' class='line-number'></div><div data-line='82' class='line-number'></div><div data-line='83' class='line-number'></div><div data-line='84' class='line-number'></div><div data-line='85' class='line-number'></div><div data-line='86' class='line-number'></div><div data-line='87' class='line-number'></div><div data-line='88' class='line-number'></div><div data-line='89' class='line-number'></div><div data-line='90' class='line-number'></div><div data-line='91' class='line-number'></div><div data-line='92' class='line-number'></div><div data-line='93' class='line-number'></div><div data-line='94' class='line-number'></div><div data-line='95' class='line-number'></div><div data-line='96' class='line-number'></div><div data-line='97' class='line-number'></div><div data-line='98' class='line-number'></div><div data-line='99' class='line-number'></div><div data-line='100' class='line-number'></div><div data-line='101' class='line-number'></div><div data-line='102' class='line-number'></div><div data-line='103' class='line-number'></div><div data-line='104' class='line-number'></div><div data-line='105' class='line-number'></div><div data-line='106' class='line-number'></div><div data-line='107' class='line-number'></div><div data-line='108' class='line-number'></div><div data-line='109' class='line-number'></div><div data-line='110' class='line-number'></div><div data-line='111' class='line-number'></div><div data-line='112' class='line-number'></div><div data-line='113' class='line-number'></div><div data-line='114' class='line-number'></div><div data-line='115' class='line-number'></div><div data-line='116' class='line-number'></div><div data-line='117' class='line-number'></div><div data-line='118' class='line-number'></div><div data-line='119' class='line-number'></div><div data-line='120' class='line-number'></div><div data-line='121' class='line-number'></div><div data-line='122' class='line-number'></div><div data-line='123' class='line-number'></div><div data-line='124' class='line-number'></div><div data-line='125' class='line-number'></div><div data-line='126' class='line-number'></div><div data-line='127' class='line-number'></div><div data-line='128' class='line-number'></div><div data-line='129' class='line-number'></div><div data-line='130' class='line-number'></div><div data-line='131' class='line-number'></div><div data-line='132' class='line-number'></div><div data-line='133' class='line-number'></div><div data-line='134' class='line-number'></div><div data-line='135' class='line-number'></div><div data-line='136' class='line-number'></div></pre></td><td class='main  nasm'><pre><div class='line'><code><span class="k">bits</span> <span class="mi">16</span>
</code></div><div class='line'><code><span class="k">org</span> <span class="mh">0x7c00</span>
</code></div><div class='line'><code><span class="nl">k_boot_start:</span>
</code></div><div class='line'><code> </code></div><div class='line'><code>  <span class="c1">; The cli instruction disables maskable external interrupts.</span>
</code></div><div class='line'><code>  <span class="nf">cli</span>
</code></div><div class='line'><code> </code></div><div class='line'><code>  <span class="c1">; Fetch Control Register 0, set bit 0 to 1 (Protection Enable bit)</span>
</code></div><div class='line'><code>  <span class="c1">; This basically enables 32-bit mode</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">eax</span><span class="p">,</span> <span class="nb">cr0</span>
</code></div><div class='line'><code>  <span class="nf">or</span> <span class="nb">al</span><span class="p">,</span> <span class="mi">1</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">cr0</span><span class="p">,</span> <span class="nb">eax</span>
</code></div><div class='line'><code> </code></div><div class='line'><code>  <span class="c1">; Now we have to jump into the 32-bit zone. The 0x08 is a 386-style segment</span>
</code></div><div class='line'><code>  <span class="c1">; descriptor, which theoretically references the Global Descriptor Table,</span>
</code></div><div class='line'><code>  <span class="c1">; though in this bare-bones bootloader we haven&#39;t even bothered to set that</span>
</code></div><div class='line'><code>  <span class="c1">; up yet and it works anyway.</span>
</code></div><div class='line'><code>  <span class="nf">jmp</span> <span class="mh">0x08</span><span class="p">:</span><span class="nv">k_32_bits</span>
</code></div><div class='line'><code> </code></div><div class='line'><code><span class="k">bits</span> <span class="mi">32</span>
</code></div><div class='line'><code><span class="nl">k_32_bits:</span>
</code></div><div class='line'><code> </code></div><div class='line'><code>  <span class="c1">; Now we&#39;re going to set up the page tables for 64-bit mode.</span>
</code></div><div class='line'><code>  <span class="c1">; Since this is a minimal example, we&#39;re just going to set up a single page.</span>
</code></div><div class='line'><code>  <span class="c1">; The 64-bit page table uses four levels of paging,</span>
</code></div><div class='line'><code>  <span class="c1">;    PML4E table =&gt; PDPTE table =&gt; PDE table =&gt; PTE table =&gt; physical addr</span>
</code></div><div class='line'><code>  <span class="c1">; You don&#39;t have to use all of them, but you have to use at least the first</span>
</code></div><div class='line'><code>  <span class="c1">; three. So we&#39;re going to set up PML4E, PDPTE, and PDE tables here, each</span>
</code></div><div class='line'><code>  <span class="c1">; with a single entry.</span>
</code></div><div class='line'><code><span class="cp">%define PML4E_ADDR 0x8000</span>
</code></div><div class='line'><code><span class="cp">%define PDPTE_ADDR 0x9000</span>
</code></div><div class='line'><code><span class="cp">%define PDE_ADDR 0xa000</span>
</code></div><div class='line'><code>  <span class="c1">; Set up PML4 entry, which will point to PDPT entry.</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="kt">dword</span> <span class="nb">eax</span><span class="p">,</span> <span class="nv">PDPTE_ADDR</span>
</code></div><div class='line'><code>  <span class="c1">; The low 12 bits of the PML4E entry are zeroed out when it&#39;s dereferenced,</span>
</code></div><div class='line'><code>  <span class="c1">; and used to encode metadata instead. Here we&#39;re setting the Present and</span>
</code></div><div class='line'><code>  <span class="c1">; Read/Write bits. You might also want to set the User bit, if you want a</span>
</code></div><div class='line'><code>  <span class="c1">; page to remain accessible in user-mode code.</span>
</code></div><div class='line'><code>  <span class="nf">or</span> <span class="kt">dword</span> <span class="nb">eax</span><span class="p">,</span> <span class="m">0b</span><span class="mi">011</span>  <span class="c1">; Would be 0b111 to set User bit also</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="kt">dword</span> <span class="p">[</span><span class="nv">PML4E_ADDR</span><span class="p">],</span> <span class="nb">eax</span>
</code></div><div class='line'><code>  <span class="c1">; Although we&#39;re in 32-bit mode, the table entry is 64 bits. We can just zero</span>
</code></div><div class='line'><code>  <span class="c1">; out the upper bits in this case.</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="kt">dword</span> <span class="p">[</span><span class="nv">PML4E_ADDR</span><span class="o">+</span><span class="mi">4</span><span class="p">],</span> <span class="mi">0</span>
</code></div><div class='line'><code>  <span class="c1">; Set up PDPT entry, which will point to PD entry.</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="kt">dword</span> <span class="nb">eax</span><span class="p">,</span> <span class="nv">PDE_ADDR</span>
</code></div><div class='line'><code>  <span class="nf">or</span> <span class="kt">dword</span> <span class="nb">eax</span><span class="p">,</span> <span class="m">0b</span><span class="mi">011</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="kt">dword</span> <span class="p">[</span><span class="nv">PDPTE_ADDR</span><span class="p">],</span> <span class="nb">eax</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="kt">dword</span> <span class="p">[</span><span class="nv">PDPTE_ADDR</span><span class="o">+</span><span class="mi">4</span><span class="p">],</span> <span class="mi">0</span>
</code></div><div class='line'><code>  <span class="c1">; Set up PD entry, which will point to the first 2MB page (0).  But we</span>
</code></div><div class='line'><code>  <span class="c1">; need to set three bits this time, Present, Read/Write and Page Size (to</span>
</code></div><div class='line'><code>  <span class="c1">; indicate that this is the last level of paging in use).</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="kt">dword</span> <span class="p">[</span><span class="nv">PDE_ADDR</span><span class="p">],</span> <span class="m">0b</span><span class="mi">10000011</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="kt">dword</span> <span class="p">[</span><span class="nv">PDE_ADDR</span><span class="o">+</span><span class="mi">4</span><span class="p">],</span> <span class="mi">0</span>
</code></div><div class='line'><code> </code></div><div class='line'><code>  <span class="c1">; Enable PGE and PAE bits of CR4 to get 64-bit paging available.</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">eax</span><span class="p">,</span> <span class="m">0b</span><span class="mi">10100000</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">cr4</span><span class="p">,</span> <span class="nb">eax</span>
</code></div><div class='line'><code> </code></div><div class='line'><code>  <span class="c1">; Set master (PML4) page table in CR3.</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">eax</span><span class="p">,</span> <span class="nv">PML4E_ADDR</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">cr3</span><span class="p">,</span> <span class="nb">eax</span>
</code></div><div class='line'><code> </code></div><div class='line'><code>  <span class="c1">; Set IA-32e Mode Enable (read: 64-bit mode enable) in the &quot;model-specific</span>
</code></div><div class='line'><code>  <span class="c1">; register&quot; (MSR) called Extended Features Enable (EFER).</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">ecx</span><span class="p">,</span> <span class="mh">0xc0000080</span>
</code></div><div class='line'><code>  <span class="nf">rdmsr</span> <span class="c1">; takes ecx as argument, deposits contents of MSR into eax</span>
</code></div><div class='line'><code>  <span class="nf">or</span> <span class="nb">eax</span><span class="p">,</span> <span class="m">0b</span><span class="mi">100000000</span>
</code></div><div class='line'><code>  <span class="nf">wrmsr</span> <span class="c1">; exactly the reverse of rdmsr</span>
</code></div><div class='line'><code> </code></div><div class='line'><code>  <span class="c1">; Enable PG flag of CR0 to actually turn on paging.</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">eax</span><span class="p">,</span> <span class="nb">cr0</span>
</code></div><div class='line'><code>  <span class="nf">or</span> <span class="nb">eax</span><span class="p">,</span> <span class="mh">0x80000000</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">cr0</span><span class="p">,</span> <span class="nb">eax</span>
</code></div><div class='line'><code> </code></div><div class='line'><code>  <span class="c1">; Load Global Descriptor Table (outdated access control, but needs to be set)</span>
</code></div><div class='line'><code>  <span class="nf">lgdt</span> <span class="p">[</span><span class="nv">gdt_hdr</span><span class="p">]</span>
</code></div><div class='line'><code> </code></div><div class='line'><code>  <span class="c1">; Jump into 64-bit zone.</span>
</code></div><div class='line'><code>  <span class="nf">jmp</span> <span class="mh">0x08</span><span class="p">:</span><span class="nv">k_64_bits</span>
</code></div><div class='line'><code> </code></div><div class='line'><code><span class="k">bits</span> <span class="mi">64</span>
</code></div><div class='line'><code><span class="nl">k_64_bits:</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rdi</span><span class="p">,</span> <span class="mh">0xb8000</span> <span class="c1">; This is the beginning of &quot;video memory.&quot;</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rdx</span><span class="p">,</span> <span class="nb">rdi</span>     <span class="c1">; We&#39;ll save that value for later, too.</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rcx</span><span class="p">,</span> <span class="mi">80</span><span class="o">&ast;</span><span class="mi">25</span>   <span class="c1">; This is how many characters are on the screen.</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">ax</span><span class="p">,</span> <span class="mh">0x7400</span>   <span class="c1">; Video memory uses 2 bytes per character. The high byte</span>
</code></div><div class='line'><code>                   <span class="c1">; determines foreground and background colors. See also</span>
</code></div><div class='line'><code><span class="c1">; http://en.wikipedia.org/wiki/List_of_8-bit_computer_hardware_palettes#CGA</span>
</code></div><div class='line'><code>                   <span class="c1">; In this case, we&#39;re setting red-on-gray (MIT colors!)</span>
</code></div><div class='line'><code>  <span class="nf">rep</span> <span class="nv">stosw</span>        <span class="c1">; Copies whatever is in ax to [rdi], rcx times.</span>
</code></div><div class='line'><code> </code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rdi</span><span class="p">,</span> <span class="nb">rdx</span>       <span class="c1">; Restore rdi to the beginning of video memory.</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rsi</span><span class="p">,</span> <span class="nv">hello</span>     <span class="c1">; Point rsi (&quot;source&quot; of string instructions) at string.</span>
</code></div><div class='line'><code>  <span class="nf">mov</span> <span class="nb">rbx</span><span class="p">,</span> <span class="nv">hello_end</span> <span class="c1">; Put end of string in rbx for comparison purposes.</span>
</code></div><div class='line'><code><span class="nl">hello_loop:</span>
</code></div><div class='line'><code>  <span class="nf">movsb</span>              <span class="c1">; Moves a byte from [rsi] to [rdi], increments rsi and rdi.</span>
</code></div><div class='line'><code>  <span class="nf">inc</span> <span class="nb">rdi</span>            <span class="c1">; Increment rdi again to skip over the color-control byte.</span>
</code></div><div class='line'><code>  <span class="nf">cmp</span> <span class="nb">rsi</span><span class="p">,</span> <span class="nb">rbx</span>       <span class="c1">; Check if we&#39;ve reached the end of the string.</span>
</code></div><div class='line'><code>  <span class="nf">jne</span> <span class="nv">hello_loop</span>     <span class="c1">; If not, loop.</span>
</code></div><div class='line'><code>  <span class="nf">hlt</span>                <span class="c1">; If so, halt.</span>
</code></div><div class='line'><code> </code></div><div class='line'><code><span class="nl">hello:</span>
</code></div><div class='line'><code>  <span class="kd">db</span> <span class="s">&quot;Hello, kernel!&quot;</span>
</code></div><div class='line'><code><span class="nl">hello_end:</span>
</code></div><div class='line'><code> </code></div><div class='line'><code><span class="c1">; Global descriptor table entry format</span>
</code></div><div class='line'><code><span class="c1">; See Intel 64 Software Developers&#39; Manual, Vol. 3A, Figure 3-8</span>
</code></div><div class='line'><code><span class="c1">; or http://en.wikipedia.org/wiki/Global_Descriptor_Table</span>
</code></div><div class='line'><code><span class="cp">%macro GDT_ENTRY 4</span>
</code></div><div class='line'><code>  <span class="c1">; %1 is base address, %2 is segment limit, %3 is flags, %4 is type.</span>
</code></div><div class='line'><code>  <span class="kd">dw</span> <span class="o">%</span><span class="mi">2</span> <span class="o">&amp;</span> <span class="mh">0xffff</span>
</code></div><div class='line'><code>  <span class="kd">dw</span> <span class="o">%</span><span class="mi">1</span> <span class="o">&amp;</span> <span class="mh">0xffff</span>
</code></div><div class='line'><code>  <span class="kd">db</span> <span class="p">(</span><span class="o">%</span><span class="mi">1</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0xff</span>
</code></div><div class='line'><code>  <span class="kd">db</span> <span class="o">%</span><span class="mi">4</span> <span class="o">|</span> <span class="p">((</span><span class="o">%</span><span class="mi">3</span> <span class="o">&lt;&lt;</span> <span class="mi">4</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0xf0</span><span class="p">)</span>
</code></div><div class='line'><code>  <span class="kd">db</span> <span class="p">(</span><span class="o">%</span><span class="mi">3</span> <span class="o">&amp;</span> <span class="mh">0xf0</span><span class="p">)</span> <span class="o">|</span> <span class="p">((</span><span class="o">%</span><span class="mi">2</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0x0f</span><span class="p">)</span>
</code></div><div class='line'><code>  <span class="kd">db</span> <span class="o">%</span><span class="mi">1</span> <span class="o">&gt;&gt;</span> <span class="mi">24</span>
</code></div><div class='line'><code><span class="cp">%endmacro</span>
</code></div><div class='line'><code><span class="cp">%define EXECUTE_READ 0b1010</span>
</code></div><div class='line'><code><span class="cp">%define READ_WRITE 0b0010</span>
</code></div><div class='line'><code><span class="cp">%define RING0 0b10101001 </span><span class="c1">; Flags set: Granularity, 64-bit, Present, S; Ring=00</span>
</code></div><div class='line'><code>                   <span class="c1">; Note: Ring is determined by bits 1 and 2 (the only &quot;00&quot;)</span>
</code></div><div class='line'><code> </code></div><div class='line'><code><span class="c1">; Global descriptor table (loaded by lgdt instruction)</span>
</code></div><div class='line'><code><span class="nl">gdt_hdr:</span>
</code></div><div class='line'><code>  <span class="kd">dw</span> <span class="nv">gdt_end</span> <span class="o">-</span> <span class="nv">gdt</span> <span class="o">-</span> <span class="mi">1</span>
</code></div><div class='line'><code>  <span class="kd">dd</span> <span class="nv">gdt</span>
</code></div><div class='line'><code><span class="nl">gdt:</span>
</code></div><div class='line'><code>  <span class="nf">GDT_ENTRY</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span>
</code></div><div class='line'><code>  <span class="nf">GDT_ENTRY</span> <span class="mi">0</span><span class="p">,</span> <span class="mh">0xffffff</span><span class="p">,</span> <span class="nv">RING0</span><span class="p">,</span> <span class="nv">EXECUTE_READ</span>
</code></div><div class='line'><code>  <span class="nf">GDT_ENTRY</span> <span class="mi">0</span><span class="p">,</span> <span class="mh">0xffffff</span><span class="p">,</span> <span class="nv">RING0</span><span class="p">,</span> <span class="nv">READ_WRITE</span>
</code></div><div class='line'><code>  <span class="c1">; You&#39;d want to have entries for other rings here, if you were using them.</span>
</code></div><div class='line'><code><span class="nl">gdt_end:</span>
</code></div><div class='line'><code> </code></div><div class='line'><code><span class="c1">; Very important - mark the sector as bootable. </span>
</code></div><div class='line'><code><span class="kd">times</span> <span class="mi">512</span> <span class="o">-</span> <span class="mi">2</span> <span class="o">-</span> <span class="p">(</span><span class="kc">$</span> <span class="o">-</span> <span class="kc">$$</span><span class="p">)</span> <span class="nv">db</span> <span class="mi">0</span> <span class="c1">; zero-pad the 512-byte sector to the last 2 bytes</span>
</code></div><div class='line'><code><span class="kd">dw</span> <span class="mh">0xaa55</span> <span class="c1">; Magic &quot;boot signature&quot;</span>
</code></div></pre></td></tr></table></div></figure>]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Octopress workflow]]></title>
    <link href="http://davidad.github.io/blog/2014/02/11/octopress-workflow/"/>
    <updated>2014-02-11T18:10:58-05:00</updated>
    <id>http://davidad.github.io/blog/2014/02/11/octopress-workflow</id>
    <content type="html"><![CDATA[<p>Today is my second day at <a href="http://hackerschool.com">Hacker School</a>, and I decided to set up a little bit of tooling for blogging about what I do here. The first tool I set up (following the recommendations of many Hacker Schoolers and alums) was <a href="http://octopress.org">Octopress</a>, a static site generator designed for <a href="pages.github.com">GitHub Pages</a> and implemented atop <a href="http://jekyllrb.com">Jekyll</a>. (The page you’re reading right now is Octopress-generated.) I followed the admirably thorough Octopress documentation for <a href="http://octopress.org/docs/setup/">installation</a>, <a href="http://octopress.org/docs/configuring/">initial configuration</a>, <a href="http://octopress.org/docs/deploying/github/">deployment with Github Pages</a>, and <a href="http://octopress.org/docs/theme/">theme customization</a><a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a>. But I wanted even more convenience. So, I’m here to introduce you to the <code>blog</code> command (the same one I used to write this very post).</p>
<pre><code>davidad@zayin ~/octopress&gt; blog
Enter a title for your post:</code></pre>
<p><code>blog</code> is a <code>bash</code> script, pretty specific to my own setup (vim, chrome, OSX), but it could be adapted to other environments. <!-- more --> <code>blog</code> can create a post using Octopress’ <code>new_post[]</code> Rake target (and you can specify a title on the command line if you want), then it opens <code>vim</code> in sort of <code>git commit</code>-ish fashion, with your cursor on the last line ready to press <code>o</code> and start typing your post, and with magical deployment when you <code>:wq</code><a href="#fn2" class="footnoteRef" id="fnref2"><sup>2</sup></a>. It also implements <code>blog deploy</code> (runs both generate and deploy), <code>blog delete</code>, and editing existing posts. Most importantly, whenever editing the script sets up a keybinding for <code>C-g</code> that saves your draft post and refreshes the local preview in a Chrome window. It does this even if you don’t have a tab open to refresh, but it also won’t open a new one if you do. And it keeps your <code>vim</code> window in the foreground. How does this work? You might expect that Chrome has a nice command-line remote interface for exactly this sort of thing. Sadly, that is not the case. However, Apple has had the foresight to allow command-driven automation of actions which can typically only be carried out graphically. Sadly again, that mechanism is <a href="http://en.wikipedia.org/wiki/AppleScript"><strong>AppleScript</strong></a>, a historical relic of a programming language.</p>
<figure class='code'><figcaption>
Reloading a website in Chrome from AppleScript
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='1' class='line-number'></div><div data-line='2' class='line-number'></div><div data-line='3' class='line-number'></div><div data-line='4' class='line-number'></div><div data-line='5' class='line-number'></div><div data-line='6' class='line-number'></div><div data-line='7' class='line-number'></div><div data-line='8' class='line-number'></div><div data-line='9' class='line-number'></div><div data-line='10' class='line-number'></div><div data-line='11' class='line-number'></div><div data-line='12' class='line-number'></div><div data-line='13' class='line-number'></div><div data-line='14' class='line-number'></div><div data-line='15' class='line-number'></div><div data-line='16' class='line-number'></div><div data-line='17' class='line-number'></div><div data-line='18' class='line-number'></div><div data-line='19' class='line-number'></div><div data-line='20' class='line-number'></div><div data-line='21' class='line-number'></div><div data-line='22' class='line-number'></div><div data-line='23' class='line-number'></div><div data-line='24' class='line-number'></div><div data-line='25' class='line-number'></div><div data-line='26' class='line-number'></div><div data-line='27' class='line-number'></div><div data-line='28' class='line-number'></div><div data-line='29' class='line-number'></div></pre></td><td class='main  applescript'><pre><div class='line'><code><span class="k">tell</span> <span class="nb">application</span> <span class="s2">&quot;Google Chrome&quot;</span>
</code></div><div class='line'><code>    <span class="k">if</span> <span class="p">(</span><span class="nb">count</span> <span class="nb">every</span> <span class="na">window</span><span class="p">)</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">then</span>
</code></div><div class='line'><code>        <span class="nb">make</span> <span class="nb">new</span> <span class="na">window</span>
</code></div><div class='line'><code>    <span class="k">end</span> <span class="k">if</span>
</code></div><div class='line'><code> </code></div><div class='line'><code>    <span class="k">set</span> <span class="nv">found</span> <span class="k">to</span> <span class="no">false</span>
</code></div><div class='line'><code>    <span class="k">set</span> <span class="nv">theTabIndex</span> <span class="k">to</span> <span class="o">-</span><span class="mi">1</span>
</code></div><div class='line'><code>    <span class="k">repeat</span> <span class="nv">with</span> <span class="nv">theWindow</span> <span class="k">in</span> <span class="nb">every</span> <span class="na">window</span>
</code></div><div class='line'><code>        <span class="k">set</span> <span class="nv">theTabIndex</span> <span class="k">to</span> <span class="mi">0</span>
</code></div><div class='line'><code>        <span class="k">repeat</span> <span class="nv">with</span> <span class="nv">theTab</span> <span class="k">in</span> <span class="nb">every</span> <span class="no">tab</span> <span class="k">of</span> <span class="nv">theWindow</span>
</code></div><div class='line'><code>            <span class="k">set</span> <span class="nv">theTabIndex</span> <span class="k">to</span> <span class="nv">theTabIndex</span> <span class="o">+</span> <span class="mi">1</span>
</code></div><div class='line'><code>            <span class="k">if</span> <span class="nv">theTab</span>&#39;s <span class="nv">URL</span> <span class="ow">contains</span> <span class="s2">&quot;$1&quot;</span> <span class="k">then</span>
</code></div><div class='line'><code>                <span class="k">set</span> <span class="nv">found</span> <span class="k">to</span> <span class="no">true</span>
</code></div><div class='line'><code>                <span class="k">exit</span>
</code></div><div class='line'><code>            <span class="k">end</span> <span class="k">if</span>
</code></div><div class='line'><code>        <span class="k">end</span> <span class="k">repeat</span>
</code></div><div class='line'><code> </code></div><div class='line'><code>        <span class="k">if</span> <span class="nv">found</span> <span class="k">then</span>
</code></div><div class='line'><code>            <span class="k">exit</span> <span class="k">repeat</span>
</code></div><div class='line'><code>        <span class="k">end</span> <span class="k">if</span>
</code></div><div class='line'><code>    <span class="k">end</span> <span class="k">repeat</span>
</code></div><div class='line'><code> </code></div><div class='line'><code>    <span class="k">if</span> <span class="nv">found</span> <span class="k">then</span>
</code></div><div class='line'><code>        <span class="k">tell</span> <span class="nv">theTab</span> <span class="k">to</span> <span class="nv">reload</span>
</code></div><div class='line'><code>        <span class="err">$</span><span class="nv">L1</span>
</code></div><div class='line'><code>    <span class="k">else</span>
</code></div><div class='line'><code>        <span class="err">$</span><span class="nv">L2</span>
</code></div><div class='line'><code>    <span class="k">end</span> <span class="k">if</span>
</code></div><div class='line'><code><span class="k">end</span> <span class="k">tell</span>
</code></div></pre></td></tr></table></div></figure>


<p>In this snippet, <code>$1</code> is going to get replaced with the site’s top-level URL (like <code>http://localhost:4000/</code> for the local preview server, or <code>http://davidad.github.io/</code> for the deployment). <code>$L1</code> and <code>$L2</code> are placeholders for two actions that we might not always <nobr>want<a href="#fn3" class="footnoteRef" id="fnref3"><sup>3</sup></a>:</nobr> changing the current tab to the tab we just refreshed, and opening up a new tab if there wasn’t already one for this site. It’s also worth noting that this script will reload the first tab that <em>contains</em> the URL – so if you have an open tab pointed at a particular page on the site, you won’t lose your place<a href="#fn4" class="footnoteRef" id="fnref4"><sup>4</sup></a>.</p>
<p>The interface to AppleScript is the <code>osascript</code> command, which accepts an AppleScript file as its argument<a href="#fn5" class="footnoteRef" id="fnref5"><sup>5</sup></a>. So, the first big chunk of the <code>blog</code> script is dedicated to producing script files. It’s implemented as a function which fills in the “holes” in the script described above.</p>
<figure class='code'><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='8' class='line-number'></div><div data-line='9' class='line-number'></div><div data-line='10' class='line-number'></div><div data-line='11' class='line-number'></div><div data-line='12' class='line-number'></div><div data-line='13' class='line-number'></div><div data-line='14' class='line-number'></div><div data-line='15' class='line-number'></div><div data-line='16' class='line-number'></div><div data-line='17' class='line-number'></div><div data-line='18' class='line-number'></div><div data-line='19' class='line-number'></div><div data-line='20' class='line-number'></div><div data-line='21' class='line-number'></div><div data-line='22' class='line-number'></div><div data-line='23' class='line-number'></div><div data-line='24' class='line-number'></div><div data-line='25' class='line-number'></div><div data-line='26' class='line-number'></div><div data-line='27' class='line-number'></div><div data-line='28' class='line-number'></div><div data-line='29' class='line-number'></div><div data-line='30' class='line-number'></div><div data-line='31' class='line-number'></div><div data-line='32' class='line-number'></div><div data-line='33' class='line-number'></div><div data-line='34' class='line-number'></div><div data-line='35' class='line-number'></div><div data-line='36' class='line-number'></div><div data-line='37' class='line-number'></div><div data-line='38' class='line-number'></div><div data-line='39' class='line-number'></div><div data-line='40' class='line-number'></div><div data-line='41' class='line-number'></div><div data-line='42' class='line-number'></div><div data-line='43' class='line-number'></div><div data-line='44' class='line-number'></div><div data-line='45' class='line-number'></div><div data-line='46' class='line-number'></div><div data-line='47' class='line-number'></div><div data-line='48' class='line-number'></div><div data-line='49' class='line-number'></div><div data-line='50' class='line-number'></div></pre></td><td class='main  bash'><pre><div class='line'><code><span class="k">function </span>wrs<span class="o">()</span> <span class="o">{</span>
</code></div><div class='line'><code>    <span class="k">if</span> <span class="o">[[</span> <span class="nv">$2</span> <span class="o">=</span> <span class="s2">&quot;y&quot;</span> <span class="o">]]</span><span class="p">;</span> <span class="k">then</span>
</code></div><div class='line'><code><span class="k">        </span><span class="nv">L1</span><span class="o">=</span><span class="s2">&quot;set theWindow&#39;s active tab index to theTabIndex&quot;</span>
</code></div><div class='line'><code>        <span class="nv">L2</span><span class="o">=</span><span class="s2">&quot;tell window 1 to make new tab with properties {URL:&#92;&quot;$1&#92;&quot;}&quot;</span>
</code></div><div class='line'><code>    <span class="k">else</span>
</code></div><div class='line'><code><span class="k">        </span><span class="nv">L1</span><span class="o">=</span><span class="s2">&quot;&quot;</span>
</code></div><div class='line'><code>        <span class="nv">L2</span><span class="o">=</span><span class="s2">&quot;&quot;</span>
</code></div><div class='line'><code>    <span class="k">fi</span>
</code></div><div class='line'><code><span class="k">    </span>cat &gt;.reload.scpt <span class="s">&lt;&lt;EOF</span>
</code></div><div class='line'><code><span class="s">delay 1.5</span>
</code></div><div class='line'><code><span class="s">tell application &quot;Google Chrome&quot;</span>
</code></div><div class='line'><code><span class="s">    </span>
</code></div><div class='line'><code><span class="s">    if (count every window) = 0 then</span>
</code></div><div class='line'><code><span class="s">        make new window</span>
</code></div><div class='line'><code><span class="s">    end if</span>
</code></div><div class='line'><code><span class="s">    </span>
</code></div><div class='line'><code><span class="s">    set found to false</span>
</code></div><div class='line'><code><span class="s">    set theTabIndex to -1</span>
</code></div><div class='line'><code><span class="s">    repeat with theWindow in every window</span>
</code></div><div class='line'><code><span class="s">        set theTabIndex to 0</span>
</code></div><div class='line'><code><span class="s">        repeat with theTab in every tab of theWindow</span>
</code></div><div class='line'><code><span class="s">            set theTabIndex to theTabIndex + 1</span>
</code></div><div class='line'><code><span class="s">            if theTab&#39;s URL contains &quot;$1&quot; then</span>
</code></div><div class='line'><code><span class="s">                set found to true</span>
</code></div><div class='line'><code><span class="s">                exit</span>
</code></div><div class='line'><code><span class="s">            end if</span>
</code></div><div class='line'><code><span class="s">        end repeat</span>
</code></div><div class='line'><code><span class="s">        </span>
</code></div><div class='line'><code><span class="s">        if found then</span>
</code></div><div class='line'><code><span class="s">            exit repeat</span>
</code></div><div class='line'><code><span class="s">        end if</span>
</code></div><div class='line'><code><span class="s">    end repeat</span>
</code></div><div class='line'><code><span class="s">    </span>
</code></div><div class='line'><code><span class="s">    if found then</span>
</code></div><div class='line'><code><span class="s">        tell theTab to reload</span>
</code></div><div class='line'><code><span class="s">        $L1</span>
</code></div><div class='line'><code><span class="s">    else</span>
</code></div><div class='line'><code><span class="s">        $L2</span>
</code></div><div class='line'><code><span class="s">    end if</span>
</code></div><div class='line'><code><span class="s">end tell</span>
</code></div><div class='line'><code><span class="s">EOF</span>
</code></div><div class='line'><code><span class="o">}</span>
</code></div><div class='line'><code>wrs <span class="s1">&#39;http://localhost:4000/&#39;</span> y
</code></div></pre></td></tr></table></div></figure>


<p>The <code>delay 1.5</code> line exists to give Octopress enough time to do its thing before trying to reload Chrome. Octopress is pretty slow.</p>
<p>In the next chunk, we handle the <code>delete</code> and <code>deploy</code> actions:</p>
<figure class='code'><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='3' class='line-number'></div><div data-line='4' class='line-number'></div><div data-line='5' class='line-number'></div><div data-line='6' class='line-number'></div></pre></td><td class='main  bash'><pre><div class='line'><code><span class="nv">ORIGDIR</span><span class="o">=</span><span class="sb">&#x60;</span><span class="nb">pwd</span> <span class="p">|</span> sed <span class="s1">&#39;s/&#92; /&#92;&#92; /g&#39;</span><span class="sb">&#x60;</span>
</code></div><div class='line'><code><span class="nb">cd</span> ~/octopress
</code></div><div class='line'><code> </code></div><div class='line'><code><span class="nv">URL</span><span class="o">=</span><span class="s2">&quot;http://davidad.github.io/&quot;</span>
</code></div></pre></td></tr></table></div></figure>



<figure class='code'><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='53' class='line-number'></div><div data-line='54' class='line-number'></div><div data-line='55' class='line-number'></div><div data-line='56' class='line-number'></div><div data-line='57' class='line-number'></div><div data-line='58' class='line-number'></div><div data-line='59' class='line-number'></div><div data-line='60' class='line-number'></div><div data-line='61' class='line-number'></div><div data-line='62' class='line-number'></div><div data-line='63' class='line-number'></div><div data-line='64' class='line-number'></div></pre></td><td class='main  bash'><pre><div class='line'><code><span class="k">if</span> <span class="o">[[</span> <span class="nv">$1</span> <span class="o">=</span> delete <span class="o">]]</span><span class="p">;</span> <span class="k">then</span>
</code></div><div class='line'><code>    <span class="o">[[</span> -f <span class="nv">$2</span> <span class="o">]]</span> <span class="o">&amp;&amp;</span> rm -i <span class="nv">$2</span> <span class="o">&amp;&amp;</span> bundle <span class="nb">exec </span>rake generate <span class="o">&amp;&amp;</span> <span class="nb">exec</span> <span class="nv">$0</span> deploy
</code></div><div class='line'><code>    <span class="nb">exit </span>0
</code></div><div class='line'><code><span class="k">elif</span> <span class="o">[[</span> <span class="nv">$1</span> <span class="o">=</span> deploy <span class="o">]]</span><span class="p">;</span> <span class="k">then</span>
</code></div><div class='line'><code><span class="k">    </span>bundle <span class="nb">exec </span>rake deploy <span class="se">&#92;</span>
</code></div><div class='line'><code>    <span class="o">&amp;&amp;</span> wrs <span class="nv">$URL</span> y <span class="o">&amp;&amp;</span> sleep 5 <span class="o">&amp;&amp;</span> osascript ./.reload.scpt <span class="se">&#92;</span>
</code></div><div class='line'><code>    <span class="o">&amp;&amp;</span> rm -f ./.reload.scpt .timeref rake_preview.log <span class="se">&#92;</span>
</code></div><div class='line'><code>    <span class="o">&amp;&amp;</span> git add . <span class="se">&#92;</span>
</code></div><div class='line'><code>    <span class="o">&amp;&amp;</span> git commit -m <span class="s2">&quot;Site updated at &#x60;date -u +&quot;</span>%Y-%m-%d %H:%M:%S UTC<span class="s2">&quot;&#x60;&quot;</span> <span class="se">&#92;</span>
</code></div><div class='line'><code>    <span class="o">&amp;&amp;</span> git push
</code></div><div class='line'><code>    <span class="nb">exit </span>0
</code></div><div class='line'><code><span class="k">fi</span>
</code></div></pre></td></tr></table></div></figure>


<p>In the case of <code>delete</code>, we use <code>rm -i</code> to ask the user to confirm the deletion, and if they do, we generate and then call the script itself (<code>$0</code>) with the deploy action (so as not to duplicate code). The deploy action deploys the generated site (to GitHub Pages), writes out a refresh script for the deployed site, waits an extra few seconds for GitHub Pages to do its thing, and then runs the reload script. Finally,<code>blog</code> commits and pushes the <code>source</code> branch of the repository, after cleaning up its temporary files – the reload script, the log from Octopress’ local preview server, and the time reference (which we’ll come to shortly).</p>
<figure class='code'><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='66' class='line-number'></div><div data-line='67' class='line-number'></div></pre></td><td class='main  bash'><pre><div class='line'><code><span class="o">[[</span> -f <span class="nv">$1</span> <span class="o">]]</span> <span class="o">&amp;&amp;</span> rm -f new_post.md <span class="o">&amp;&amp;</span> ln -s <span class="nv">$1</span> new_post.md
</code></div><div class='line'><code><span class="o">[[</span> -f <span class="nv">$1</span> <span class="o">]]</span> <span class="o">||</span> bundle <span class="nb">exec </span>rake <span class="s2">&quot;new_post[$1]&quot;</span>
</code></div></pre></td></tr></table></div></figure>


<p>We’re managing a symbolic link called <code>new_post.md</code> here, which is what we’re going to call <code>vim</code> on. If a filename is specified, we point the link directly at that file. Otherwise, we’re going to call <code>rake</code> to set up the file. By default, <code>rake</code> won’t give any indication to our script of what file it made, so we’re going to make a tweak to the <code>Rakefile</code>:</p>
<figure class='code'><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='1' class='line-number'></div><div data-line='2' class='line-number'></div><div data-line='3' class='line-number'></div><div data-line='4' class='line-number'></div><div data-line='5' class='line-number'></div><div data-line='6' class='line-number'></div><div data-line='7' class='line-number'></div><div data-line='8' class='line-number'></div><div data-line='9' class='line-number'></div><div data-line='10' class='line-number'></div><div data-line='11' class='line-number'></div><div data-line='12' class='line-number'></div><div data-line='13' class='line-number'></div><div data-line='14' class='line-number'></div><div data-line='15' class='line-number'></div><div data-line='16' class='line-number'></div><div data-line='17' class='line-number'></div><div data-line='18' class='line-number'></div><div data-line='19' class='line-number'></div><div data-line='20' class='line-number'></div><div data-line='21' class='line-number'></div><div data-line='22' class='line-number'></div></pre></td><td class='main  diff'><pre><div class='line'><code><span class="gu">@@ -104,9 +89,7 @@ task :new_post, :title do |t, args|</span>
</code></div><div class='line'><code>   raise &quot;### You haven&#39;t set anything up yet. First run &#x60;rake install&#x60; to set up an Octopress theme.&quot; unless File.directory?(source_dir)
</code></div><div class='line'><code>   mkdir_p &quot;#{source_dir}/#{posts_dir}&quot;
</code></div><div class='line'><code>   filename = &quot;#{source_dir}/#{posts_dir}/#{Time.now.strftime(&#39;%Y-%m-%d&#39;)}-#{title.to_url}.#{new_post_ext}&quot;
</code></div><div class='line'><code><span class="gd">-  if File.exist?(filename)</span>
</code></div><div class='line'><code><span class="gd">-    abort(&quot;rake aborted!&quot;) if ask(&quot;#{filename} already exists. Do you want to overwrite?&quot;, [&#39;y&#39;, &#39;n&#39;]) == &#39;n&#39;</span>
</code></div><div class='line'><code><span class="gd">-  end</span>
</code></div><div class='line'><code><span class="gi">+  if not (File.exist?(filename) and ask(&quot;#{filename} already exists. Do you want to overwrite?&quot;, [&#39;y&#39;, &#39;n&#39;]) == &#39;n&#39;)</span>
</code></div><div class='line'><code>     puts &quot;Creating new post: #{filename}&quot;
</code></div><div class='line'><code>     open(filename, &#39;w&#39;) do |post|
</code></div><div class='line'><code>       post.puts &quot;---&quot;
</code></div><div class='line'><code>       post.puts &quot;layout: post&quot;
</code></div><div class='line'><code>       post.puts &quot;title: &#92;&quot;#{title.gsub(/&amp;/,&#39;&amp;amp;&#39;)}&#92;&quot;&quot;
</code></div><div class='line'><code>       post.puts &quot;date: #{Time.now.strftime(&#39;%Y-%m-%d %H:%M:%S %z&#39;)}&quot;
</code></div><div class='line'><code>       post.puts &quot;comments: true&quot;
</code></div><div class='line'><code>       post.puts &quot;categories: &quot;
</code></div><div class='line'><code>       post.puts &quot;---&quot;
</code></div><div class='line'><code>     end
</code></div><div class='line'><code><span class="gi">+  end</span>
</code></div><div class='line'><code><span class="gi">+  system &quot;rm -f new_post.md&quot;</span>
</code></div><div class='line'><code><span class="gi">+  system &quot;ln -s #{filename} new_post.md&quot;</span>
</code></div><div class='line'><code> end
</code></div></pre></td></tr></table></div></figure>


<p>The first changeset handles the case where I don’t want to overwrite the existing post, but I <em>do</em> want to proceed to edit it (and deploy the edits). The last two lines simply point <code>new_post.md</code> at the right spot so our script can call <code>vim</code> on it. Before we call vim, though, we have to set up the deploy-on-save feature and the live(ish)-preview feature…</p>
<figure class='code'><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='69' class='line-number'></div></pre></td><td class='main  bash'><pre><div class='line'><code>touch -m .timeref
</code></div></pre></td></tr></table></div></figure>


<p><code>.timeref</code> is an empty file which keeps track of the time slightly before vim was launched. In a “successful” session, the modification time of the post file should be newer than <code>.timeref</code>, whereas if you <code>:q!</code> immediately, it won’t be. Now, it’s worth pointing out that the live-preview requires saving along the way, so <strong>if you want to abort after previewing, use <code>:cq</code></strong>, <code>vim</code>’s command for exiting with a nonzero status code (so the shell script knows what’s up). The script supports both mechanisms, so that if you are aborting immediately but forget to <code>:cq</code>, The Right Thing should happen.</p>
<figure class='code'><figcaption>
manage preview processes
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='70' class='line-number'></div><div data-line='71' class='line-number'></div><div data-line='72' class='line-number'></div></pre></td><td class='main  bash'><pre><div class='line'><code>ps x <span class="p">|</span> egrep <span class="s1">&#39;rake|rackup|jekyll|sass|compass&#39;</span> <span class="p">|</span> grep -v grep <span class="p">|</span> awk <span class="s1">&#39;{ print $1 }&#39;</span> <span class="p">|</span> xargs <span class="nb">kill</span>
</code></div><div class='line'><code>ps x <span class="p">|</span> egrep <span class="s1">&#39;rackup&#39;</span> <span class="p">|</span> grep -v grep <span class="p">|</span> awk <span class="s1">&#39;{ print $1 }&#39;</span> <span class="p">|</span> xargs <span class="nb">kill</span> -9
</code></div><div class='line'><code>bundle <span class="nb">exec </span>rake preview &gt; rake_preview.log 2&gt;<span class="p">&amp;</span>1 <span class="p">&amp;</span>
</code></div></pre></td></tr></table></div></figure>


<p>Now we’re going to kill off any existing preview processes (they really start to pile up otherwise!) and launch a new one. We also log its <code>stdout</code> and <code>stderr</code> so you can see what the preview process is up to if you want (<code>tail -f rake_preview.log</code>).</p>
<figure class='code'><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='73' class='line-number'></div><div data-line='74' class='line-number'></div></pre></td><td class='main  bash'><pre><div class='line'><code>sleep 0.3
</code></div><div class='line'><code>osascript ./.reload.scpt
</code></div></pre></td></tr></table></div></figure>


<p>We give the preview process a little time to get started and then display the preview in the browser so the user knows what they’re working from.</p>
<figure class='code'><figcaption>
Run vim
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='76' class='line-number'></div><div data-line='77' class='line-number'></div><div data-line='78' class='line-number'></div><div data-line='79' class='line-number'></div><div data-line='80' class='line-number'></div><div data-line='81' class='line-number'></div><div data-line='82' class='line-number'></div><div data-line='83' class='line-number'></div><div data-line='84' class='line-number'></div></pre></td><td class='main  bash'><pre><div class='line'><code>vim -c <span class="s1">&#39;set tw=80&#39;</span> -c <span class="s1">&#39;map &lt;C-G&gt; :w&lt;CR&gt;:!osascript ./.reload.scpt&lt;CR&gt;&lt;CR&gt;&#39;</span> <span class="se">&#92;</span>
</code></div><div class='line'><code>    -c <span class="s2">&quot;cd $ORIGDIR&quot;</span> + new_post.md
</code></div><div class='line'><code><span class="nv">VIM_STATUS</span><span class="o">=</span><span class="nv">$?</span>
</code></div><div class='line'><code><span class="o">[[</span> <span class="sb">&#x60;</span>readlink new_post.md<span class="sb">&#x60;</span> -nt .timeref <span class="o">]]</span> <span class="o">||</span> <span class="nv">VIM_STATUS</span><span class="o">=</span>1
</code></div><div class='line'><code><span class="o">[</span> <span class="nv">$VIM_STATUS</span> -eq 0 <span class="o">]</span> <span class="o">&amp;&amp;</span> osascript ./.reload.scpt <span class="o">&amp;&amp;</span> <span class="nb">exec</span> <span class="nv">$0</span> deploy <span class="o">&amp;&amp;</span> <span class="nb">exit </span>0
</code></div><div class='line'><code><span class="o">[</span> <span class="nv">$VIM_STATUS</span> -ne 0 <span class="o">]</span> <span class="o">&amp;&amp;</span> wrs <span class="s1">&#39;http://localhost:4000/&#39;</span> n <span class="se">&#92;</span>
</code></div><div class='line'><code>    <span class="o">&amp;&amp;</span> <span class="o">[</span> -f new_post.md <span class="o">]</span> <span class="o">&amp;&amp;</span> rm -i <span class="sb">&#x60;</span>readlink new_post.md<span class="sb">&#x60;</span> <span class="se">&#92;</span>
</code></div><div class='line'><code>    <span class="o">&amp;&amp;</span> git rm --ignore-unmatch new_post.md <span class="se">&#92;</span>
</code></div><div class='line'><code>    <span class="o">&amp;&amp;</span> sleep 0.4 <span class="o">&amp;&amp;</span> osascript ./.reload.scpt
</code></div></pre></td></tr></table></div></figure>


<p>This is the last piece of the script, where we actually run <code>vim</code> and then take the appropriate action after it exits. We’re giving <code>vim</code> a number of commands on the command line, including setting auto-wrapping at 80 columns (<code>tw=80</code>), scrolling to the bottom of the file (<code>+</code>), and changing to the directory the script was run from (set all the way back on line 3). Most importantly, we’re forcing a normal-mode mapping of <code>C-g</code> to the reload script!</p>
<p>Once <code>vim</code> exits, we capture its return code with <code>$?</code>. Then we check if the file has actually been saved. Either it has, <em>or</em> (<code>||</code>) the status really ought to be nonzero. If the status is still <code>0</code>, then we do one final preview and shift into deploy mode. Otherwise, we remove the file that <code>new_post.md</code> points to, remove <code>new_post.md</code> itself, and reload<a href="#fn6" class="footnoteRef" id="fnref6"><sup>6</sup></a>.</p>
<h3 id="putting-it-all-together">Putting it all together</h3>
<figure class='code'><figcaption>
/usr/bin/blog<a href='https://gist.github.com/davidad/8981964'>gist</a>
</figcaption><div class='highlight'><table><tr><td class='line-numbers' aria-hidden='true'><pre><div data-line='1' class='line-number'></div><div data-line='2' class='line-number'></div><div data-line='3' class='line-number'></div><div data-line='4' class='line-number'></div><div data-line='5' class='line-number'></div><div data-line='6' class='line-number'></div><div data-line='7' class='line-number'></div><div data-line='8' class='line-number'></div><div data-line='9' class='line-number'></div><div data-line='10' class='line-number'></div><div data-line='11' class='line-number'></div><div data-line='12' class='line-number'></div><div data-line='13' class='line-number'></div><div data-line='14' class='line-number'></div><div data-line='15' class='line-number'></div><div data-line='16' class='line-number'></div><div data-line='17' class='line-number'></div><div data-line='18' class='line-number'></div><div data-line='19' class='line-number'></div><div data-line='20' class='line-number'></div><div data-line='21' class='line-number'></div><div data-line='22' class='line-number'></div><div data-line='23' class='line-number'></div><div data-line='24' class='line-number'></div><div data-line='25' class='line-number'></div><div data-line='26' class='line-number'></div><div data-line='27' class='line-number'></div><div data-line='28' class='line-number'></div><div data-line='29' class='line-number'></div><div data-line='30' class='line-number'></div><div data-line='31' class='line-number'></div><div data-line='32' class='line-number'></div><div data-line='33' class='line-number'></div><div data-line='34' class='line-number'></div><div data-line='35' class='line-number'></div><div data-line='36' class='line-number'></div><div data-line='37' class='line-number'></div><div data-line='38' class='line-number'></div><div data-line='39' class='line-number'></div><div data-line='40' class='line-number'></div><div data-line='41' class='line-number'></div><div data-line='42' class='line-number'></div><div data-line='43' class='line-number'></div><div data-line='44' class='line-number'></div><div data-line='45' class='line-number'></div><div data-line='46' class='line-number'></div><div data-line='47' class='line-number'></div><div data-line='48' class='line-number'></div><div data-line='49' class='line-number'></div><div data-line='50' class='line-number'></div><div data-line='51' class='line-number'></div><div data-line='52' class='line-number'></div><div data-line='53' class='line-number'></div><div data-line='54' class='line-number'></div><div data-line='55' class='line-number'></div><div data-line='56' class='line-number'></div><div data-line='57' class='line-number'></div><div data-line='58' class='line-number'></div><div data-line='59' class='line-number'></div><div data-line='60' class='line-number'></div><div data-line='61' class='line-number'></div><div data-line='62' class='line-number'></div><div data-line='63' class='line-number'></div><div data-line='64' class='line-number'></div><div data-line='65' class='line-number'></div><div data-line='66' class='line-number'></div><div data-line='67' class='line-number'></div><div data-line='68' class='line-number'></div><div data-line='69' class='line-number'></div><div data-line='70' class='line-number'></div><div data-line='71' class='line-number'></div><div data-line='72' class='line-number'></div><div data-line='73' class='line-number'></div><div data-line='74' class='line-number'></div><div data-line='75' class='line-number'></div><div data-line='76' class='line-number'></div><div data-line='77' class='line-number'></div><div data-line='78' class='line-number'></div><div data-line='79' class='line-number'></div><div data-line='80' class='line-number'></div><div data-line='81' class='line-number'></div><div data-line='82' class='line-number'></div><div data-line='83' class='line-number'></div><div data-line='84' class='line-number'></div><div data-line='85' class='line-number'></div></pre></td><td class='main  bash'><pre><div class='line'><code><span class="c">#!/bin/bash</span>
</code></div><div class='line'><code> </code></div><div class='line'><code><span class="nv">ORIGDIR</span><span class="o">=</span><span class="sb">&#x60;</span><span class="nb">pwd</span> <span class="p">|</span> sed <span class="s1">&#39;s/&#92; /&#92;&#92; /g&#39;</span><span class="sb">&#x60;</span>
</code></div><div class='line'><code><span class="nb">cd</span> ~/octopress
</code></div><div class='line'><code> </code></div><div class='line'><code><span class="nv">URL</span><span class="o">=</span><span class="s2">&quot;http://davidad.github.io/&quot;</span>
</code></div><div class='line'><code> </code></div><div class='line'><code><span class="k">function </span>wrs<span class="o">()</span> <span class="o">{</span>
</code></div><div class='line'><code>    <span class="k">if</span> <span class="o">[[</span> <span class="nv">$2</span> <span class="o">=</span> <span class="s2">&quot;y&quot;</span> <span class="o">]]</span><span class="p">;</span> <span class="k">then</span>
</code></div><div class='line'><code><span class="k">        </span><span class="nv">L1</span><span class="o">=</span><span class="s2">&quot;set theWindow&#39;s active tab index to theTabIndex&quot;</span>
</code></div><div class='line'><code>        <span class="nv">L2</span><span class="o">=</span><span class="s2">&quot;tell window 1 to make new tab with properties {URL:&#92;&quot;$1&#92;&quot;}&quot;</span>
</code></div><div class='line'><code>    <span class="k">else</span>
</code></div><div class='line'><code><span class="k">        </span><span class="nv">L1</span><span class="o">=</span><span class="s2">&quot;&quot;</span>
</code></div><div class='line'><code>        <span class="nv">L2</span><span class="o">=</span><span class="s2">&quot;&quot;</span>
</code></div><div class='line'><code>    <span class="k">fi</span>
</code></div><div class='line'><code><span class="k">    </span>cat &gt;.reload.scpt <span class="s">&lt;&lt;EOF</span>
</code></div><div class='line'><code><span class="s">delay 1.5</span>
</code></div><div class='line'><code><span class="s">tell application &quot;Google Chrome&quot;</span>
</code></div><div class='line'><code><span class="s">    </span>
</code></div><div class='line'><code><span class="s">    if (count every window) = 0 then</span>
</code></div><div class='line'><code><span class="s">        make new window</span>
</code></div><div class='line'><code><span class="s">    end if</span>
</code></div><div class='line'><code><span class="s">    </span>
</code></div><div class='line'><code><span class="s">    set found to false</span>
</code></div><div class='line'><code><span class="s">    set theTabIndex to -1</span>
</code></div><div class='line'><code><span class="s">    repeat with theWindow in every window</span>
</code></div><div class='line'><code><span class="s">        set theTabIndex to 0</span>
</code></div><div class='line'><code><span class="s">        repeat with theTab in every tab of theWindow</span>
</code></div><div class='line'><code><span class="s">            set theTabIndex to theTabIndex + 1</span>
</code></div><div class='line'><code><span class="s">            if theTab&#39;s URL contains &quot;$1&quot; then</span>
</code></div><div class='line'><code><span class="s">                set found to true</span>
</code></div><div class='line'><code><span class="s">                exit</span>
</code></div><div class='line'><code><span class="s">            end if</span>
</code></div><div class='line'><code><span class="s">        end repeat</span>
</code></div><div class='line'><code><span class="s">        </span>
</code></div><div class='line'><code><span class="s">        if found then</span>
</code></div><div class='line'><code><span class="s">            exit repeat</span>
</code></div><div class='line'><code><span class="s">        end if</span>
</code></div><div class='line'><code><span class="s">    end repeat</span>
</code></div><div class='line'><code><span class="s">    </span>
</code></div><div class='line'><code><span class="s">    if found then</span>
</code></div><div class='line'><code><span class="s">        tell theTab to reload</span>
</code></div><div class='line'><code><span class="s">        $L1</span>
</code></div><div class='line'><code><span class="s">    else</span>
</code></div><div class='line'><code><span class="s">        $L2</span>
</code></div><div class='line'><code><span class="s">    end if</span>
</code></div><div class='line'><code><span class="s">end tell</span>
</code></div><div class='line'><code><span class="s">EOF</span>
</code></div><div class='line'><code><span class="o">}</span>
</code></div><div class='line'><code>wrs <span class="s1">&#39;http://localhost:4000/&#39;</span> y
</code></div><div class='line'><code> </code></div><div class='line'><code> </code></div><div class='line'><code><span class="k">if</span> <span class="o">[[</span> <span class="nv">$1</span> <span class="o">=</span> delete <span class="o">]]</span><span class="p">;</span> <span class="k">then</span>
</code></div><div class='line'><code>    <span class="o">[[</span> -f <span class="nv">$2</span> <span class="o">]]</span> <span class="o">&amp;&amp;</span> rm -i <span class="nv">$2</span> <span class="o">&amp;&amp;</span> bundle <span class="nb">exec </span>rake generate <span class="o">&amp;&amp;</span> <span class="nb">exec</span> <span class="nv">$0</span> deploy
</code></div><div class='line'><code>    <span class="nb">exit </span>0
</code></div><div class='line'><code><span class="k">elif</span> <span class="o">[[</span> <span class="nv">$1</span> <span class="o">=</span> deploy <span class="o">]]</span><span class="p">;</span> <span class="k">then</span>
</code></div><div class='line'><code><span class="k">    </span>bundle <span class="nb">exec </span>rake deploy <span class="se">&#92;</span>
</code></div><div class='line'><code>    <span class="o">&amp;&amp;</span> wrs <span class="nv">$URL</span> y <span class="o">&amp;&amp;</span> sleep 5 <span class="o">&amp;&amp;</span> osascript ./.reload.scpt <span class="se">&#92;</span>
</code></div><div class='line'><code>    <span class="o">&amp;&amp;</span> rm -f ./.reload.scpt .timeref rake_preview.log <span class="se">&#92;</span>
</code></div><div class='line'><code>    <span class="o">&amp;&amp;</span> git add . <span class="se">&#92;</span>
</code></div><div class='line'><code>    <span class="o">&amp;&amp;</span> git commit -m <span class="s2">&quot;Site updated at &#x60;date -u +&quot;</span>%Y-%m-%d %H:%M:%S UTC<span class="s2">&quot;&#x60;&quot;</span> <span class="se">&#92;</span>
</code></div><div class='line'><code>    <span class="o">&amp;&amp;</span> git push
</code></div><div class='line'><code>    <span class="nb">exit </span>0
</code></div><div class='line'><code><span class="k">fi</span>
</code></div><div class='line'><code> </code></div><div class='line'><code><span class="o">[[</span> -f <span class="nv">$1</span> <span class="o">]]</span> <span class="o">&amp;&amp;</span> rm -f new_post.md <span class="o">&amp;&amp;</span> ln -s <span class="nv">$1</span> new_post.md
</code></div><div class='line'><code><span class="o">[[</span> -f <span class="nv">$1</span> <span class="o">]]</span> <span class="o">||</span> bundle <span class="nb">exec </span>rake <span class="s2">&quot;new_post[$1]&quot;</span>
</code></div><div class='line'><code> </code></div><div class='line'><code>touch -m .timeref
</code></div><div class='line'><code>ps x <span class="p">|</span> egrep <span class="s1">&#39;rake|rackup|jekyll|sass|compass&#39;</span> <span class="p">|</span> grep -v grep <span class="p">|</span> awk <span class="s1">&#39;{ print $1 }&#39;</span> <span class="p">|</span> xargs <span class="nb">kill</span>
</code></div><div class='line'><code>ps x <span class="p">|</span> egrep <span class="s1">&#39;rackup&#39;</span> <span class="p">|</span> grep -v grep <span class="p">|</span> awk <span class="s1">&#39;{ print $1 }&#39;</span> <span class="p">|</span> xargs <span class="nb">kill</span> -9
</code></div><div class='line'><code>sleep 0.15
</code></div><div class='line'><code>bundle <span class="nb">exec </span>rake preview &lt; /dev/zero &gt; rake_preview.log 2&gt;<span class="p">&amp;</span>1 <span class="p">&amp;</span>
</code></div><div class='line'><code>sleep 0.3
</code></div><div class='line'><code>osascript ./.reload.scpt
</code></div><div class='line'><code> </code></div><div class='line'><code>vim -c <span class="s1">&#39;set tw=80&#39;</span> -c <span class="s1">&#39;map &lt;C-G&gt; :w&lt;CR&gt;:!osascript ./.reload.scpt&lt;CR&gt;&lt;CR&gt;&#39;</span> <span class="se">&#92;</span>
</code></div><div class='line'><code>    -c <span class="s2">&quot;cd $ORIGDIR&quot;</span> + new_post.md
</code></div><div class='line'><code><span class="nv">VIM_STATUS</span><span class="o">=</span><span class="nv">$?</span>
</code></div><div class='line'><code><span class="o">[[</span> <span class="sb">&#x60;</span>readlink new_post.md<span class="sb">&#x60;</span> -nt .timeref <span class="o">]]</span> <span class="o">||</span> <span class="nv">VIM_STATUS</span><span class="o">=</span>1
</code></div><div class='line'><code><span class="o">[</span> <span class="nv">$VIM_STATUS</span> -eq 0 <span class="o">]</span> <span class="o">&amp;&amp;</span> osascript ./.reload.scpt <span class="o">&amp;&amp;</span> <span class="nb">exec</span> <span class="nv">$0</span> deploy <span class="o">&amp;&amp;</span> <span class="nb">exit </span>0
</code></div><div class='line'><code><span class="o">[</span> <span class="nv">$VIM_STATUS</span> -ne 0 <span class="o">]</span> <span class="o">&amp;&amp;</span> wrs <span class="s1">&#39;http://localhost:4000/&#39;</span> n <span class="se">&#92;</span>
</code></div><div class='line'><code>    <span class="o">&amp;&amp;</span> <span class="o">[</span> -f new_post.md <span class="o">]</span> <span class="o">&amp;&amp;</span> rm -i <span class="sb">&#x60;</span>readlink new_post.md<span class="sb">&#x60;</span> <span class="se">&#92;</span>
</code></div><div class='line'><code>    <span class="o">&amp;&amp;</span> git rm --ignore-unmatch new_post.md <span class="se">&#92;</span>
</code></div><div class='line'><code>    <span class="o">&amp;&amp;</span> sleep 0.4 <span class="o">&amp;&amp;</span> osascript ./.reload.scpt
</code></div></pre></td></tr></table></div></figure>


<section class="footnotes">
<hr />
<ol>
<li id="fn1"><p>All of the files for theming etc. are available <a href="https://github.com/davidad/davidad.github.io/tree/source">here</a>. I’ve spent way too much time tweaking the CSS, and fixing various peeves with the way Octopress renders – I could write an entire other blog post about that, but I probably won’t.<a href="#fnref1">↩</a></p></li>
<li id="fn2"><p>Or <code>:x</code>. My muscle memory has been <code>:wq</code> for many years and I haven’t yet made a serious effort to retrain.<a href="#fnref2">↩</a></p></li>
<li id="fn3"><p>One example where we don’t want these actions is if the blog post was aborted. Then there’s no sense in tabbing back to the preview just to show that it’s gone, but if the user is looking at the preview anyway, may as well refresh it to reflect the abort.<a href="#fnref3">↩</a></p></li>
<li id="fn4"><p>Chrome will even restore your scroll position once the refresh is finished.<a href="#fnref4">↩</a></p></li>
<li id="fn5"><p>You can also pass AppleScript on <code>osascript</code>’s command line using the <code>-e</code> option, but only one line of AppleScript at a time. And since there’s no statement separator in AppleScript, we can’t easily transform an arbitrary script into a one-liner (like we could in <code>bash</code>, or many other more sensible languages).<a href="#fnref5">↩</a></p></li>
<li id="fn6"><p>using a newly generated AppleScript which won’t cause Chrome to switch the active tab, in case the abort was related to something else having come up.<a href="#fnref6">↩</a></p></li>
</ol>
</section>]]></content>
  </entry>
  
</feed>
