There is only one way to send email with SMTP: Connect, send EHLO, MAIL FROM, RCPT TO, BODY and the message. There are variations, but they're small. Most protocols offer implementers much more choice, many even demand much more.
This post is about a way to quantify the efficiency of the choices an implementer makes. It's a bit extremist, because extremism is simple and requires little thought: Bytes transmitted are defined to be useful if they're displayed on the user's display before the use case ends, and to be waste in all other cases.
The wasted bytes may be avoidable waste, unavoidable waste, Schrödinger's waste or even useful waste — it doesn't matter. This method applies a simple and unfair definition. Because the test is simple you can apply it quickly, and usually you learn something from doing the analysis.
The method leads to three numbers and hopefully nonzero insight: Percentage of bytes used to display, number of bytes downloaded per character displayed, and finally number of bytes used per character displayed. You get the insight not from the three numbers, but rather as a side effect of computing the three numbers.
I'll elaborate with two longish examples. The first shows how a mail reader uses the network and a server for a particular use case, the second shows how a web page/site/app uses the network and browser for another use case.
Case 1: Reading my mail. The use case is me reading whatever new mail has arrived since lunchtime, using an unnamed IMAP client. I'll quote some of the network traffic, and do the sums as I go along.
(Brief pause while I read mail. There were 14 messages I could delete unseen, two I had to look at, none that needed a reply.)
The four commands a0000 STARTTLS, a0001 CAPABILITY, a0002 AUTHENTICATE and a0003 CAPABILITY open the connection. The total network traffic is on the order of 2k, of which about 80% is required (the a0003 command is not), but none of this is displayed on-screen by the client, so this is 0% efficient as measured by this method. 1600 bytes of unavoidable waste, 400 bytes of avoidable waste, waste is waste. The 1600 unavoidable bytes just show that perfect efficiency is not possible.
Next come a0004 LIST and a0005 MYRIGHTS, which are pointless and strange. They verify that I indeed have an inbox and can access it. They involve only about 100 bytes of traffic, of which nothing is displayed on-screen, so again, 0% efficiency.
a0006 SELECT opens my inbox. 200 bytes. At most a quarter of the information in those 200 bytes is actually used by the client to display something on screen, so I generously classify it as 25% efficient.
a0007 UID FETCH 1:* FLAGS retrieves the current flags for every message in my mailbox. The client uses it to update its cache of the mailbox, and learn about new or deleted messages. The response is 67584 lines totalling 2.9Mbytes, of which 50 lines (about 2k) are used to display something on-screen, implying an efficiency of about 0.007%.
a0008 FETCH ... (HEADER FLAGS) retrieves the message headers for the dozen-odd newly arrived messages, plus the flags (also fetched by the previous command). The response was about 40k, of which perhaps 8-10k is at least partly displayed on-screen. This number is a bit uncertain, since there is some intelligence and duplicate elimination and so on. But the range is good enough. 20-25% efficiency.
a0009 UID FETCH BODY and a0010 UID FETCH BODY retrieve two full messages. The message bodies were displayed completely, the headers not at all (everything the client displayed was available from command a0008). 8k retrieved, 5k displayed, 60% efficiency.
a0011 UID STORE, a0012 UID STORE and a0013 UID STORE update three flags on sixteen messages (the two I read and fourteen I deleted unread). The server replies with information about the new flag values of the 16 messages, but the client is exiting and displays nothing at all. These commands cause slightly less than 1kbytes response from the server, 0% of which is displayed on-screen, so although the commands are useful this is pure waste. This is a perfect example of the useful waste I mentioned above, and shows why higher efficiency is not always better.
a0014 CLOSE and a0015 LOGOUT close the connection. 0%.
Adding it up: The client downloads about 2.9Mbytes, of which it uses about 16kb for on-screen display, giving a network efficiency of 0.6%. 99.4% of the networks traffic is not used to display anything on the user's screen.
But you knew that. You noticed the problem when you read 0.007% and 2.9Mbytes in the a0007 paragraph. During the analysis, you saw that that cache performs terribly in this use case, and that's the kind of insight you can expect. Usually it's not that blunt. (I really wonder about that cache. As you see it leads to terrible efficiency for big mailboxes. It performs much better for small mailboxes, but a cache design that only works well for small data sets is... strange.)
I'm not going to name the client. The point of this isn't to say that xyz has just 0.6% overall efficiency. That would be flamewar fodder, and I'm looking to learn something, not to start a flamewar. The point of this analysis is to show how easy it is to to look at TCP connections, add up displayed bytes and other bytes, and learn something from the process.
I ran the program in a window large enough to display about 4000 characters (50 lines by 80 characters). It displayed three screens of information. One screen contained some information from the cache and some downloaded, I count that as 4000 characters. Two screens contained a message each and some whitespace, I count those as 2×3000 characters. This which lets me calculate two more interesting metrics:
The client downloaded 2.9MBytes, or about 290 bytes for each character it displayed. It needed to download about 16kbytes, or about 1.6 bytes per character displayed. (The 1.6 is a not a large number, but it could have been even smaller. In some use cases, a caching client can get far below 1.0, if it has a big cache, updates its cache efficiently, and keeps unavoidable protocol overhead to a minimum.)
When a single factor yanks these metrics down as much as the cache does in this case, I like to repeat the analysis without that and see if there's anything else. Do it on the numbers above if you want.
You may have noticed that I rounded all the numbers. I do that on principle so as not to be sidetracked by details or waste time on insignificant precision. Whether the response is 2000 bytes or 2112 doesn't matter — the insight you get is the same. Collecting exact numbers can lead you to digress, but it does not teach you more.
Case 2: A web page. For this I made a reduced, simplified version of the home page of a pay-per-view film service as it was in late 2011 or early 2012. (The page was improved later.) The use case is a user who opens the page, clicks on the bottom right film to select/enlarge that, then on the same film again to start playing it. The user's window or display is large enough to show the entire page.
All of the HTML is displayed, so that part is 100% efficient. The HTML loads six assets, namely google analytics, jquery and four images.
Google Analytics doesn't cause anything to be displayed on-screen, so 0%. It does something for the site owner, and the cost of that utility shows up as a lower efficiency rating here. Jquery is used for a single line of code which has results on-screen. I guesstimate that that single line involves 1% of jquery's code, which makes jquery's download about 1% efficient. This is a matter of taste and judgment. The counting process points out that jquery might need discussion, and I resolve the discussion as 1%.
Both GA and jquery may be cached and/or compressed. I don't care. Caching and compression change the numbers at the end, but generally don't change what you learn along the way. I leave them out so there's less to talk about. This blog post is too long already.
One of the images is displayed at full resolution, so that's 100% efficient. One of the images are displayed scaled-down at first and in full resolution after the click, so 100% on that too. The last two images are only displayed in scaled-down resolution, so not 100%. By scientifically guessing what the file size needs to be for the scaled-down images, I derive an efficiency of 42% for those.
GA is 40k, jquery is 93k, the two 100% images are 44k each, the two 42% images are 44k each, and the page itself is 2k, ie. the page downloads 311k in total while it needs to download 130k, for a total page efficiency of 42%. What a coincidence.
Again, I also count the bytes per character. The screen area used to display the example document is plausibly about 40 characters wide and twelve lines plus the two image blocks high. The small images are about four lines high, the large one about six, so the total area is roughly 40×22 characters. At 311k, the example page downloads about 353 bytes per character displayed, of which it needs to download about 147 bytes per character. (The number 147 is the one people will tell you to shrink by running optipng, pngquant, convert -interlace jpeg and so on. They're probably right, but in my opinion it's better to analyse the problem at hand than to follow advice blindly. Even good advice.)
Shallowminded geniuses will have noticed that the numbers of classic text-based protocols and text-only web pages can be much smaller than graphics-heavy web pages, and think that's all.
Not so fast. Comparing the weight of apples with that of oranges is valid in the sense that both are measured in kg, but the comparison does not yield insight. In this case, 353 bytes/character may be good or bad, there isn't any general threshold of acceptable efficiency. Perhaps you can compare the number 353 to the same use case for a competing streaming site, or to another page on the same site, or to last month's version of the same page. But you cannot compare it to a different page on another site, and even though some orange trees bear about 353 oranges per year, you cannot compare it to oranges. Or apples.
The insight comes from doing the analysis, not from the result. When you compute the partial size numbers and then add them up, you see clearly and quickly what drags down your efficiency in a particular use case.
Doing the sum shows you clearly that e.g. the cache check at the start is what drags down the mail reader's efficiency. The cache check lets page-down and page-up be very fast (they don't need partial cache refreshes), but it clearly drags down the network usage efficiency in the use case above. Or you can see that the snappy image replacement is the biggest efficiency problem for the film home page in the use case above, and consider, is that feature worth the cost? Or you can see that jquery is an unreasonable cost in all of your important use cases — or that jquery is valuable in some use cases and too inefficient in others.
This is a great tool for identifying misoptimisation. If it says you're efficient in use case 1 and terribly inefficient in use case 2, is that a problem? It depends. Maybe a feature helps case 1 and harms case 2, and case 1 is more important. That may be perfect. Or maybe case 1 isn't really important, and you've identified a misoptimisation.
It's also quite easy to compare your own efficiency with that of a competing site or implementation, to see what problems you have that the competitor avoids.
The numbers at the end aren't very valuable. I suppose they can be useful to keep track of your historical performance, so you won't blindly slide down, but I think it's better to graph the results of some fully automated performance test.