MIME, uuencode, etc.


MIME and uuencode are formats for representing non-ASCII characters or arbitrary binary data in messages such as e-mail or newsgroup postings.

Quoted-Printable

Occasionally, you'll see messages that contain phrases of the form "=XX" (as in "=3D"). These messages may also have long lines wrapped, with continuations indicated by an "=" at the end of each chopped line. If you check the headers for the message, you'll see lines similar to the following:

   Mime-Version: 1.0
   Content-Type: text/plain; charset=iso-8859-1
   Content-Transfer-Encoding: quoted-printable

Such a message has been encoded using the MIME (Multipurpose Internet Mail Extensions) "quoted-printable" format. This format is sometimes used for messages that contain characters that might not be acceptable to the net (e.g., characters outside the range 32-127), or for characters represented in a character set other than ASCII. See below for a function named DEMIME, which translates quoted-printable messages to plain text.

Base64

Another MIME format is "octet-stream" or "base64". This format is used for binary files (i.e., files in which little, if any, of the text is printable). A base64 encoded message will include lines similar to the following (not necessarily in the header):

   Content-Type: application/octet-stream; name="filename"
   Content-Transfer-Encoding: base64

The binary data will look something like this:

ICAgIOwgIFoGRCDxUVVBRENBU0UgTTtBO0oNClsxXSAgICCmQ29udmVydHMgdGhlIGNhc2Ug
b2YgcXVhZCBuYW1lcyBpbiBhcnJheSAXDQpbMl0gICAgpiBDb252ZXJ0cyBuYW1lcyB0byB1
cHBlcmNhc2UgaWYgRD0xLCB0byBsb3dlcmNhc2UgaWYgRD39MQ0KWzNdICAgIKYgSWYgRD0w
LCBjb252ZXJ0cyBuYW1lcyB0byB0aGUgcHJpbWFyeSBhbHBoYWJldCAo8UFMUEhTWzE7XSkN

An APL program named MIMEDECODE, which translates base64 to binary, is given below. However, if you have anything but a small quantity of base64 material to translate, you'll probably want to use a non-APL program to do the translation. I use the "mpack" and "munpack" programs to encode and decode base64. You can find these at:

ftp://ftp.andrew.cmu.edu/pub/mpack/

For more information about MIME, see the MIME FAQ at:

ftp://rtfm.mit.edu/pub/usenet-by-group/news.answers/mail/mime-faq/

Uuencode

Another common ASCII coding form is uuencode, which is used on Unix systems. Although uuencode is popular, it's not very well standardized--there are subtle differences in the format used by different uuencode programs. APL functions named UUENCODE and UUDECODE are given below.

Other

You'll also occasionally see mention of BinHex or StuffIt encoding. These are formats used on Macintosh computers. BinHex is just an ASCII coding for binary data, similar to base64, but it deals with the two-part nature of Macintosh files (which consist of a "data fork" and a "resource fork"). StuffIt is a compression program, similar to pkzip in the PC world.

MIME and APL

Various people have pointed out that MIME can be used to represent APL characters in messages. This is true, but there are a some serious obstacles, including:

  1. There's no standard 8-bit APL character set, and there are reasons to suspect that there will never be one.

  2. APL symbols represented as "=XX" would be incomprehensible until you went to the trouble to pass them through a translator.

  3. New subscribers to APL-L and people casually visiting comp.lang.apl would have to set up software before they could read the APL. I think this "setup" effort would discourage people from joining in the APL discussions.

(See the "Discussion" section of my "APL-ASCII Workspace Transfer" paper for more information.)

These problems are what led me to develop the APLASCII {keyword} transliteration workspace. {keyword}-transliterated messages aren't as compact as MIME-coded messages, and they aren't MIME standard, but their meaning is fairly obvious even to the uninitiated. And automatic translators are available for all major APL systems. For more information, see:


Programs

These programs were written for APL*PLUS, but except for references to #TCNL (the newline character), they should run fine on just about any APL system.

Here's the quoted-printable translator:

     {del} Z{<-}DEMIME A;B;I;N;X;#IO
[1]    @Translates Mime =XX codes and unwraps long lines in message {omega}
[2]    @ The argument should be a newline-delimited vector
[3]    #IO{<-}1
[4]    I{<-}I/{iota}{rho}I{<-}A='=' @ find the = signs
[5]    X{<-}A[I{jot}.+1 2]          @ the 2 chars following each =
[6]    N{<-}(B{<-}X[;1]=#TCNL)/I    @ indices of end-of-line =s (wrapped lines)
[7]    X{<-}(~B){slashbar}X         @ ignore the EOL =s for now
[8]    I{<-}(~B)/I
[9]    X{<-}16{basevalue}{neg}1+'0123456789ABCDEF'{iota}{transpose}X {+
                                 +} @ decimal code for each =XX char
[10]   A[I]{<-}LATIN1[1+X]          @replace the = with the char
[11]   B{<-}({rho}A){rho}1
[12]   B[(,N{jot}.+0 1),,I{jot}.+1 2]{<-}0 {+
                                 +} @ remove =<CR> and the XX following =
[13]   Z{<-}B/A
     {del}

{del}.    LATIN1{<-}'................................ !"{#}$%{&}''()*+,-./01{+
+}23456789:;<=>?{@}ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxy{+
+}z{leftbrace}|{rightbrace}~..................................{U+00A1}{cent}{+
+}{pounds}{pitimes}{yen}|S{each}ca{U+00AB}~-r{neg}{jot}{delta_}23`uP.,1{jot}{+
+}{U+00BB}???{U+00BF}AAAA{U+00C4}A?{U+00C7}{U+00C9}EEEIIII?{U+00D1}{U+00F3}O{+
+}OO{U+00D6}{signum}OUUU{U+00DC}Y?{U+00DF}{U+00E0}{U+00E1}{U+00E2}a{U+00E4}a{+
+}?{U+00E7}{U+00E8}{U+00E9}{U+00EA}{U+00EB}{U+00ED}i{U+00EE}{U+00EF}?{U+00F1{+
+}}o{U+00F3}{U+00F4}o{U+00F6}{divide}o{U+00F9}{U+00FA}{U+00FB}{U+00FC}y?y'

The LATIN1 variable is a 256-element vector containing the character set for the message. Its value should depend on the "charset=" declaration in the MIME headers, but I'm currently using a fixed table because the character set has usually been ISO-8859-1 for the MIME messages I've seen. The value above is appropriate for APL*PLUS systems, but it'll need some work on other systems. (The keywords {U+XXXX} are Unicode character indices.) You'll need to find a listing of the ISO-8859-1 character set somewhere and type it in for your APL system. If a character doesn't occur in your character set, substitute something similar.


Here's a base64-to-binary translator. Note that you must manually extract the data block from your message, passing only it to the function.

     {del} Z{<-}MIMEDECODE A;N;R;#IO
[1]    @Decodes Mime base64 transfer text {omega}
[2]    @ The argument is a character matrix or newline-delimited vector
[3]    @   containing the the Mime data block, without the delimiter lines.
[4]    #IO{<-}0
[5]    A{<-}(~A{epsilon}#TCNL,' =')/A{<-},A
[6]    R{<-}'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
[7]    N{<-}{floor}({rho}A){divide}4
[8]    A{<-}{transpose}(N,4){rho}A      @ groups of 4 radix-64 chars
[9]    A{<-}64{basevalue}R{iota}A       @ pack 'em
[10]   Z{<-}#AV[,{transpose}256 256 256{represent}A] {+
                                     +} @ unpack as groups of 3 bytes
     {del}

Here are the uuencode and uudecode translators:

     {del} Z{<-}UUENCODE A;B;E;F;I;J;Q;R;S;#IO
[1]    @Represents character vector {omega} in uuencode format
[2]    @ The argument is an arbitrary character vector
[3]    @ The result is a character matrix
[4]    #IO{<-}1
[5]    Q{<-}45@num input chars per line in the output
[6]    @ Coding alphabet, using back quote instead of space:
[7]    R{<-}'`!"{#}$%{&}''()*+,-./0123456789:;<=>?{@}ABCDEFGHIJKLMNOPQRSTUVW{+
   +}XYZ[\]^_'
[8]    S{<-}(Q{divide}3),3
[9]    F{<-}4{rho}64
[10]   Z{<-}(({ceiling}({rho}A){divide}Q),1+4{times}Q{divide}3){rho}' '
[11]   E{<-}Q{times}{floor}({rho}A){divide}Q
[12]   I{<-}-Q & J{<-}0
[13]  L1:{->}(E{<=}I{<-}I+Q)/L2@Loop for each full-sized block
[14]   B{<-}A[I+{iota}Q]@   the block
[15]   Z[J{<-}J+1;]{<-}R[1+Q,,{transpose}F{represent}256{basevalue}{+
   +}{transpose}{neg}1+#AV{iota}S{rho}B]@   translate
[16]   {->}L1@Endloop
[17]  L2:{->}(E={rho}A)/0@If there's a small block left over,
[18]   B{<-}I{drop}A@   the small block
[19]   Q{<-}3{times}{ceiling}({rho}B){divide}3@   how many triplets
[20]   S{<-}(Q{divide}3),3
[21]   Z[J+1;]{<-}(1{drop}{rho}Z){take}R[1+({rho}B),,{transpose}F{represent}{+
   +}256{basevalue}{transpose}{neg}1+#AV{iota}S{rho}Q{take}B]@   translate it
     {del}

     {del} Z{<-}UUDECODE A;B;E;I;J;N;R;W;#IO
[1]    @Converts uuencoded text {omega} to binary
[2]    @ The argument is either a character matrix or newline delimited vector
[3]    @ The result is a character vector
[4]    @ Allows space to be coded as either space or back quote
[5]    #IO{<-}1
[6]    {->}(2={rho}{rho}A)/L1
[7]    A{<-}VTOM((#TCNL{/=}1{take}A)/#TCNL),A@make A be a matrix
[8]    @ {first} #TCNL is APL*PLUS for newline
[9]   L1:R{<-}'` !"{#}$%{&}''()*+,-./0123456789:;<=>?{@}ABCDEFGHIJKLMNOPQRST{+
   +}UVWXYZ[\]^_'
[10]   @ {first} Coding alphabet with back quote at front
[11]   W{<-}0{max}{neg}2+R{iota}A[;1]@number of output chars for each input {+
   +}line
[12]   A{<-}0 1{drop}A@remove count column
[13]   Z{<-}(+/W){rho}' '
[14]   I{<-}J{<-}0 & E{<-}''{rho}{rho}A
[15]  L2:{->}(E<I{<-}I+1)/0@Loop for each input line
[16]   N{<-}{ceiling}W[I]{divide}3@   number of triples on the line
[17]   B{<-},{transpose}256 256 256{represent}64{basevalue}{transpose}(N,4){+
   +}{rho}0{max}{neg}2+R{iota}(N{times}4){take}A[I;]
[18]   B{<-}W[I]{take}#AV[1+B]
[19]   Z[J+{iota}{rho}B]{<-}B
[20]   J{<-}J+{rho}B
[21]   {->}L2
     {del}

     {del} Z{<-}VTOM V;E;F;I;M;W
[1]    @Converts delimited vector {omega} to matrix.  Delimiter is 1{take}{+
   +}{omega}.
[2]    F{<-}V=1{take}V{<-},V
[3]    I{<-}(F/{iota}{rho}F),1+{rho}F
[4]    W{<-}(1{drop}I)-{neg}1{drop}I
[5]    E{<-}W{jot}.{>=}{iota}M{<-}0{max}{max}/W
[6]    Z{<-}(,E)\V
[7]    Z{<-}0 1{drop}(({rho}W),M){rho}Z
     {del}


Home Page