HomeUser Control Panel (unavailable in archive)ForumsTutorialsArt GalleryResourcesMaps

Decent MPQ decompression source code

07-29-2008, 01:06 AM#1
Strilanc
I've been hacking at the wave and huffman decompression code in libmpq for about a day now. The huffman stuff is in a particularly horrible state [ most evil line ever: ht->first = (tree_item*)&ht->first ].

I've managed to get a readable implementation working, but there are errors in the outputs. The wave files I decompress sound correct, except for some sort of chopping sound in the background.

I was wondering if anyone knew other projects with source code for these mpq decompression algos.
07-29-2008, 08:54 AM#2
PitzerMike
The recently posted Sfmpq library should have it all working I guess.
07-29-2008, 09:38 AM#3
Strilanc
I actually looked and couldn't find it anywhere in the source code. It looks like it delegates that to storm.dll.
07-29-2008, 08:45 PM#4
midiway
take a look at StormLib
07-30-2008, 04:57 AM#5
Strilanc
Excellent! The code in that library clearly came from the same source as what I was looking at, but had more work done on it. Now decompression is working like a charm!

-- Now it's time to start refactoring until the code looks much nicer.
07-30-2008, 08:14 PM#6
d07.RiV
SFmpq uses the same zezula sources with the evil huffman (most likely it was copied off asm code ><)

I guess you could try using some known huffman algorithms and see if they work.
I got tired of rewriting code so I just copied the zezula sources for pkzip huffman and wave into my library.
07-30-2008, 11:55 PM#7
Strilanc
Did you spend much time trying to understand the huffman decompression? It's insane!.

Setting a variable to its own address, cast to the variable's original type, should be illegal. I looked all over the code trying to find out why ht->last was changing, and it was because if ht->first = &ht->first then ht->first->prev points at ht->last . Changing the order of variables in the structs breaks the code because of that.

In any case I managed to make my own implementation of it. Once you start really taking out the cruft, it drops down to ~300 lines.
07-31-2008, 02:49 AM#8
xttocs
I'm interested in seeing your final result. A working readable implementation of MPQ decompression code would be fantastic. I mean, hell, there could even be comments in there too.
07-31-2008, 08:31 AM#9
Strilanc
I'll post it once I get it cleaned up. There are still little problems here and there. I can extract almost all the wave files from the war3 mpq, but some of them still don't work (eg. one of the hero lich responses).

I still don't do bzip, zlib.
07-31-2008, 09:20 AM#10
d07.RiV
Again, the zezula code is most likely just a quick analysis of storm.dll disassembly, and "quick analysis" implies that a lot of things will be guessed wrong and will look ugly.
I said I gave up trying to make it look nice.

zlib and bzip are easy because there are libraries for them, but I haven't figured out where to get a dll'less bzip2 library (the one I downloaded ended up requiring a dll).
08-04-2008, 06:51 AM#11
PandaMine
Quote:
Setting a variable to its own address, cast to the variable's original type, should be illegal. I looked all over the code trying to find out why ht->last was changing, and it was because if ht->first = &ht->first then ht->first->prev points at ht->last . Changing the order of variables in the structs breaks the code because of that.

Funny thing is we had a similar question in our computing exam for uni, 3/4 of the people got it wrong.

People that right that sought of code should be shot, its like trying to follow a cheetah on speed and rocket boosters
08-08-2008, 02:14 AM#12
Strilanc
Here's the class I use to decompress huffman. It doesn't stand on its own, but it's a hell of a lot clearer than what I based it on. Language is VB2005.net.

Code:
'''<summary>Exposes a plaintext IO.Stream around a huffman compressed substream</summary>
Public Class MPQHuffmanStream
    Inherits ConversionStream
    '''<summary>The frequency tables used to construct the initial huffman trees</summary>
    Public Shared ReadOnly frequencyTables()() As UInteger = { _
        array(Of UInteger)( _
            10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2 _
        ), _
        array(Of UInteger)( _
            84, 22, 22, 13, 12, 8, 6, 5, 6, 5, 6, 3, 4, 4, 3, 5, 14, 11, 20, 19, 19, 9, 11, 6, 5, 4, 3, 2, 3, 2, 2, 2, _
            13, 7, 9, 6, 6, 4, 3, 2, 4, 3, 3, 3, 3, 3, 2, 2, 9, 6, 4, 4, 4, 4, 3, 2, 3, 2, 2, 2, 2, 3, 2, 4, _
            8, 3, 4, 7, 9, 5, 3, 3, 3, 3, 2, 2, 2, 3, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 2, 2, _
            6, 10, 8, 8, 6, 7, 4, 3, 4, 4, 2, 2, 4, 2, 3, 3, 4, 3, 7, 7, 9, 6, 4, 3, 3, 2, 1, 2, 2, 2, 2, 2, _
            10, 2, 2, 3, 2, 2, 1, 1, 2, 2, 2, 6, 3, 5, 2, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 1, 1, 1, _
            2, 1, 1, 1, 1, 1, 1, 2, 4, 4, 4, 7, 9, 8, 12, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 3, _
            4, 1, 2, 4, 5, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 4, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, _
            2, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 2, 6, 75 _
        ), _
        array(Of UInteger)( _
            0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 39, 0, 0, 35, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            255, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 6, 14, 16, 4, 6, 8, 5, 4, 4, 3, 3, 2, 2, 3, 3, 1, 1, 2, 1, 1, _
            1, 4, 2, 4, 2, 2, 2, 1, 1, 4, 1, 1, 2, 3, 3, 2, 3, 1, 3, 6, 4, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, _
            1, 41, 7, 22, 18, 64, 10, 10, 17, 37, 1, 3, 23, 16, 38, 42, 16, 1, 35, 35, 47, 16, 6, 7, 2, 9, 1, 1, 1, 1, 1 _
        ), _
        array(Of UInteger)( _
            255, 11, 7, 5, 11, 2, 2, 2, 6, 2, 2, 1, 4, 2, 1, 3, 9, 1, 1, 1, 3, 4, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, _
            5, 1, 1, 1, 13, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, _
            10, 4, 2, 1, 6, 3, 2, 1, 1, 1, 1, 1, 3, 1, 1, 1, 5, 2, 3, 4, 3, 3, 3, 2, 1, 1, 1, 2, 1, 2, 3, 3, _
            1, 3, 1, 1, 2, 5, 1, 1, 4, 3, 5, 1, 3, 1, 3, 3, 2, 1, 4, 3, 10, 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, _
            2, 2, 1, 10, 2, 5, 1, 1, 2, 7, 2, 23, 1, 5, 1, 1, 14, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, _
            1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, _
            6, 2, 1, 4, 5, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, _
            1, 1, 1, 1, 1, 1, 1, 1, 7, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 17 _
        ), _
        array(Of UInteger)( _
            255, 251, 152, 154, 132, 133, 99, 100, 62, 62, 34, 34, 19, 19, 24, 23 _
        ), _
        array(Of UInteger)( _
            255, 241, 157, 158, 154, 155, 154, 151, 147, 147, 140, 142, 134, 136, 128, 130, _
            124, 124, 114, 115, 105, 107, 95, 96, 85, 86, 74, 75, 64, 65, 55, 55, _
            47, 47, 39, 39, 33, 33, 27, 28, 23, 23, 19, 19, 16, 16, 13, 13, _
            11, 11, 9, 9, 8, 8, 7, 7, 6, 5, 5, 4, 4, 4, 25, 24 _
        ), _
        array(Of UInteger)( _
            195, 203, 245, 65, 255, 123, 247, 33, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            191, 204, 242, 64, 253, 124, 247, 34, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            122, 70 _
        ), _
        array(Of UInteger)( _
            195, 217, 239, 61, 249, 124, 233, 30, 253, 171, 241, 44, 252, 91, 254, 23, _
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            189, 217, 236, 61, 245, 125, 232, 29, 251, 174, 240, 44, 251, 92, 255, 24, _
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            112, 108 _
        ), _
        array(Of UInteger)( _
            186, 197, 218, 51, 227, 109, 216, 24, 229, 148, 218, 35, 223, 74, 209, 16, _
            238, 175, 228, 44, 234, 90, 222, 21, 244, 135, 233, 33, 246, 67, 252, 18, _
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            176, 199, 216, 51, 227, 107, 214, 24, 231, 149, 216, 35, 219, 73, 208, 17, _
            233, 178, 226, 43, 232, 92, 221, 21, 241, 135, 231, 32, 247, 68, 255, 19, _
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            95, 158 _
        ) _
    }

    Private Class HuffmanNode
        Public leftChild As HuffmanNode = Nothing
        Public rightChild As HuffmanNode = Nothing
        Public parent As HuffmanNode = Nothing
        Public val As Integer = -1
        Public freq As UInteger = 0
        Public Sub New(ByVal val As Integer, ByVal freq As UInteger)
            Me.val = val
            Me.freq = freq
        End Sub
        Public Sub New(ByVal leftChild As HuffmanNode, ByVal rightChild As HuffmanNode)
            Me.leftChild = leftChild
            Me.rightChild = rightChild
            leftChild.parent = Me
            rightChild.parent = Me
            Me.freq = leftChild.freq + rightChild.freq
        End Sub
    End Class

    '''<summary>A binary tree with the property that the sum over the leafs of depth*frequency is minimized</summary>
    Private Class HuffmanTree
        Public ReadOnly nodes As New List(Of HuffmanNode) 'sorted list of the nodes in the tree
        Public leafMap As New Dictionary(Of Integer, HuffmanNode) 'takes a value and gives the leaf containing that value

        '''<summary>Constructs the initial tree using the given frequency table</summary>
        Public Sub New(ByVal freqTable As UInteger())
            'leafs
            For i As Integer = 0 To freqTable.Length - 1
                If freqTable(i) > 0 Then
                    insert(New HuffmanNode(i, freqTable(i)))
                End If
            Next i
            'special leafs
            insert(New HuffmanNode(&H100, 1)) '[end of stream]
            insert(New HuffmanNode(&H101, 1)) '[new value follows]
            'tree
            For i As Integer = nodes.Count - 1 To 1 Step -1
                insert(New HuffmanNode(nodes(i), nodes(i - 1)))
            Next i 'decrementing by 1 is enough because the new node shifted the list
        End Sub

        '''<summary>Adds the node to the sorted list of nodes, and adds leafs to the value map</summary>
        Private Sub insert(ByVal n As HuffmanNode)
            If n.val <> -1 Then leafMap(n.val) = n
            For i As Integer = 0 To nodes.Count - 1
                If n.freq > nodes(i).freq Then
                    nodes.Insert(i, n)
                    Return
                End If
            Next i
            nodes.Add(n)
        End Sub

        '''<summary>Increases the frequency of the given value, and updates the tree to maintain optimality</summary>
        Public Sub increase(ByVal val As Integer)
            Dim n As HuffmanNode

            'Create a new node for the value if it isn't even in the tree
            If Not leafMap.ContainsKey(val) Then
                'add the new value node by pairing it with the lowest frequency node
                '[this transformation maintains the optimality of the tree]
                n = New HuffmanNode(val, 0)
                leafMap(val) = n
                Dim m As HuffmanNode = nodes(nodes.Count - 1) 'sibling
                Dim g As HuffmanNode = m.parent 'grandparent
                Dim p As New HuffmanNode(n, m) 'parent
                p.parent = g
                If g.rightChild Is m Then g.rightChild = p Else g.leftChild = p
                nodes(nodes.Count - 1) = p
                nodes.Add(m)
                nodes.Add(n)
            End If

            'Get the leaf with the value to increase
            n = leafMap(val)

            'Increase the frequency of the leaf and its ancestors and restructure the tree to match
            While n IsNot Nothing
                n.freq += CUInt(1)
                'find the new position the node must occupy in the ordered list
                Dim i As Integer = nodes.IndexOf(n) - 1
                While i >= 0 AndAlso nodes(i).freq < n.freq
                    i -= 1
                End While
                i += 1
                'if the node has to change positions, we need to update the tree and the list
                Dim m As HuffmanNode = nodes(i)
                If m IsNot n Then
                    'switch places in list
                    nodes(nodes.IndexOf(n)) = m
                    nodes(i) = n
                    'switch subtrees rooted at n and m
                    '[note that m has the same frequency as n used to have, so m's new ancestors will not need to be updated]
                    Dim t As HuffmanNode
                    If m.parent Is n.parent Then
                        t = m.parent.rightChild
                        m.parent.rightChild = m.parent.leftChild
                        m.parent.leftChild = t
                    Else
                        If m.parent IsNot Nothing Then If m.parent.rightChild Is m Then m.parent.rightChild = n Else m.parent.leftChild = n
                        If n.parent IsNot Nothing Then If n.parent.rightChild Is n Then n.parent.rightChild = m Else n.parent.leftChild = m
                    End If
                    t = m.parent
                    m.parent = n.parent
                    n.parent = t
                End If
                'repeat for ancestors
                n = n.parent
            End While
        End Sub
    End Class

    Private tree As HuffmanTree = Nothing
    Private isZeroTree As Boolean = False
    Private numBytesDecompressed As Integer = 0
    Private hitEnd As Boolean = False
    Private curNode As HuffmanNode = Nothing
    Private ReadOnly subBitBuffer As New BitBuffer() 'buffered coded bits
    Private ReadOnly overBitBuffer As New BitBuffer() 'buffered decoded bits

    Public Sub New(ByVal substream As IO.Stream, Optional ByVal mode As modes = modes.read)
        MyBase.New(substream, mode)
    End Sub

    Public Overrides Property Position() As Long
        Get
            Return numBytesDecompressed
        End Get
        Set(ByVal value As Long)
            Seek(value, IO.SeekOrigin.Begin)
        End Set
    End Property

    Public Overrides Function estimateInputSize(ByVal outputSize As Integer) As Integer
        estimateInputSize = CInt(outputSize * 0.8) + 1
        If tree Is Nothing Then estimateInputSize += 1
    End Function

    Public Overrides Sub convert(ByVal readBuffer() As Byte, ByVal readOffset As Integer, ByRef readCount As Integer, _
                                ByVal writeBuffer() As Byte, ByVal writeOffset As Integer, ByRef writeCount As Integer)
        Dim maxWriteCount As Integer = writeCount
        Dim maxReadCount As Integer = readCount
        writeCount = 0
        readCount = 0
        If hitEnd Then Return

        'first byte is the index of the frequency table used to build the tree
        If tree Is Nothing Then
            If readCount >= maxReadCount Then Return
            Dim frequencyTableIndex As Byte = readBuffer(readOffset + readCount)
            readCount += 1
            isZeroTree = frequencyTableIndex = 0
            If frequencyTableIndex >= frequencyTables.Length Then Throw New IO.IOException("Error decompressing Huffman Stream: invalid frequency index")
            tree = New HuffmanTree(frequencyTables(frequencyTableIndex))
        End If

        'decompress
        While writeCount < maxWriteCount
            Do While overBitBuffer.numBits >= 8
                writeBuffer(writeOffset + writeCount) = overBitBuffer.takeByte()
                writeCount += 1
                If writeCount >= maxWriteCount Then Exit While
            Loop

            'travel to leaf node
            If curNode Is Nothing Then curNode = tree.nodes(0) 'root
            Do While curNode.val = -1
                If subBitBuffer.numBits <= 0 Then
                    If readCount >= maxReadCount Then Exit While
                    subBitBuffer.queueByte(readBuffer(readOffset + readCount))
                    readCount += 1
                End If
                curNode = switch(subBitBuffer.takeBit(), curNode.rightChild, curNode.leftChild)
            Loop

            'interpret leaf value
            Select Case curNode.val
                Case &H100 'end of stream
                    hitEnd = True
                    Exit While
                Case &H101 'new value
                    If readCount >= maxReadCount Then Exit While
                    subBitBuffer.queueByte(readBuffer(readOffset + readCount))
                    readCount += 1
                    Dim b As Byte = subBitBuffer.takeByte()
                    overBitBuffer.queueByte(b)
                    tree.increase(b)
                    If Not isZeroTree Then tree.increase(b)
                Case Else
                    overBitBuffer.queueByte(CByte(curNode.val))
                    If isZeroTree Then tree.increase(curNode.val)
            End Select
            curNode = Nothing
        End While

        numBytesDecompressed += writeCount
    End Sub
End Class
08-08-2008, 02:29 AM#13
xttocs
Bah, VB... you should code in C. Looks good though.