Converting a string with Base64 and Uncompress

by davidski at 2012-11-03 12:01:37

I’m receiving a blob of compressed (.Z) and Base64 JSON from a CIF server (http://code.google.com/p/collective-intelligence-framework/wiki/API) and having a deuce of a time decoding it. I’m using PSCX to do a convertfrom-base64, followed by [System.Text.Encoding]::UTF8.GetString(), then dumping the results through expand-archive. Sample code is:

$string = $result.data.feed.entry
$bytes = [System.Convert]::FromBase64String($string);
$decoded = [System.Text.Encoding]::UTF8.GetString($bytes)
$decoded | out-file c:\raw.Z
expand-archive c]

No matter how I tweak this, I can’t get a valid compressed archive out of this. I’ve tried to reverse this by taking a ps1 script over to a Unix box and running it through compress and base64 and confirmed that the problem exists even for this known good blob. Any thoughts as to what critical step I’m missing or otherwise flubbing?

David
by MattG at 2012-11-04 06:58:23
Hi David,

The only issue I see with your script is that considering you’re dealing with compressed binary data, you wouldn’t want to encode is to UTF8. You should leave it as is (a byte array) and decompress it accordingly.

As for the decompression step, I got it to work but it was no trivial matter. I used the base64 string from http://code.google.com/p/collective-intelligence-framework/wiki/API in the Feed Queries example. The string in that example is a non-standard base64 string though. It has ‘\n’ characters throughout it. I removed them but then I had to adjust the padding bytes accordingly. Ultimately, I removed both the padding bytes - ‘==’. The string then decoded successfully.

After inspecting the binary data, it appeared to me as though it was a Zlib stream with no file format data which any decompression utility will have issues with. I ended up using the DotNetZip library to get the job done http://dotnetzip.codeplex.com/downloads/get/258012.

Here’s the code I wrote to get the job done:
# Base64 String taken from http://code.google.com/p/collective-intelligence-framework/wiki/API (Feed Queries example)
# String was modified though to unencode properly. Removed all '\n' and base64 padding. I have no idea why they put invalid chars '' into the stream
$b64Str = 'eJztmW1v20YSgP8KwQ/3yUvv+4uA4sCzdVcDil1EDlqkKA77MmsTkUiBpOykQf77DWUnsA5pA6WtcWcHEESR3Jmdmd19dmb18/vy7XrVDrO3Q1POyutx3MyOj29vb6tbUXX91TGnlB3/9GKxjNew9qRph9G3Ecqj8gb6oelalGIVxfuzNjYJ2rGcvS/r1Qh968fmBs5OpwcfX97dxa4ddy13HQ7Y46+wHcbexzfQVz5sB6ji9fG6a5ux66vN9ebv190wfmdpZXmlZWXY35r0nQYGwHx22dmclfI6Mc+1co5Zp6VBo3pAtU0c7wzdbMOqieWHD0dlPQwwDOt7e8/WGx/HfdNCN7YwFtt+hXoGQHeb8d1kcnN1XaKGk67Nk08YjD05ySs19Yzet1d4327XKHrX62bbb7oBBcp1MzZXfmfXUTm/QdFTP/pJ0z9X3e10Xb4bRlhPv867tOujTgn9GVD6QSAmU/BFM6nyqzslP78v0/huM/UzuY9mHD0wsG82aHXEZ2vw7Z2RfdOjot8VO1/sSaCCLwhwR40o5if15XxB6mVRL3dPjop59COsinMYb7v+zZ5WP7RfUvvJd3rM5Z7wpofcvC0//HJULqG/ae5Gptn8e9N3Yxe7FbbRKPFD14+rZtgNlhQ4MNPQ/NYMBU+zlVYRkzInilJKrDCaJCo4o9yB1xZ1tn4NO+OSMsl7khREImKMJBhmCWMQE9UZvPDTiJ3CCHG8bHZCnDJOKCNcXlI6231el1ObIeJY3c/daYUUcZpzUzj+e8R/P2LRaiko2syZJ8IERqyh6JHU3lhtmAt5L5BX2yZNVu6vnhYgkbEjb1qcoEflS1jhMKYaX9/sVsb+Ki8ZR4etCcQwpokyOhPLKBDHndO4dL13ercoED2zYYeXRRf9fWfbvp01MObZxvd+jXxar4b7VrMG10MmE3WmqfJ/xC/HKs5EZSulJ37xnGWMkjvmpVN45S6bxL0LIQbJ9NPl14NAPBq/XtUH8ospzXVxdrlAduF3cdJhlNt3h/Lqk6+PwyvFBU4oGYgNKhLlKCfBWoa8iiJrlhko+LN4pZ4Sr5LRKSQfCLcgibJJkpAzI5ra5GngWRj7HHllpu2WSTERS2B6FYwPPmqcNeBShKiEj1wzw6MPT5xYH0PxaMx6eXEoswSnrqgvL16Q7y+Wl2fn/yrqsVsX3+NwYtti+XLxVfzaeY4AE389wHiykbvkiAaRiAKeiTdJkCAVOJ+ideZPS7ieFMAMM8IyTklOGohKxhIPGUjkiWsvdUzhOSZcXFfWVszZCWBgmVXcSG8NEz6DCDxIG0VINnjcNp82wD6F4tEAdjo/EGCCCcmL0zk5WdTnF5cXP5zV5PTVcqofT1a+RaxsGo/pGLkHTrEYU/U1SLuLBSKNP0INmWJIEoEjI7UEN0xBvFKBuCipj7gyGf2GtM8hTYNJModAqMqGKIWBs4xF4mgCnZ0PoP0zQxozsmLcVVywyqndMRjjxmmgFLdHDeC4CToohmFTxhrBni7T9mNxONQ8Pjm4jFwempJJzWxRv6hfX5yTer4s6rX/tWur2K2PCpwTB9Pro9cU6bU7JPiD9JLmC/iSTBsJBgjVXmFJKShxxgHBFSgsYkFRQ7/h6zP4ctS7oJKakrGAJaUEgvsAEMcwDQnesiTZs8OXrTi1lTGVZZCCYYBqIOEgLVLFgvVPBcATNJYWoG2binTK8HoXi8kvLVoRmZkkwVP87/Ub8+Wyzq4kcI/nWzWvlivu27DWr62pP8BwF4nLMxidujnPBlrI2Yh2nMw3TSxGmRZIDAsmXfQPYZkGVKA2DNRCRoTGBDpMRnobC+dNllCtRY9dxAZiUWERWXU0ElJ5JRn7WLQerEFAdqWaAWVGDeK6zM9VMm2V4s/kfzMI64scX5xclHXBUXG+h3Pg/FCbaB/uvSsXvn+fSPJHN/PcWwInI8GEsoVkFEaWOIjTJPiZmyWGo65fU3in3uhB+Dw53lRHlLiWJYgQchHKERsw5wKQj44+nYL/8BIC6idA'
$Unencoded = [Convert]::FromBase64String($b64Str)

# Load the Zlib library. Downloaded from http://dotnetzip.codeplex.com/downloads/get/258012
[Reflection.Assembly]::LoadFrom('C:\Users\Test\Desktop\DotNetZipLib-DevKit-v1.9\zip-v1.9\Release\Ionic.Zip.dll')
$MemStream = New-Object IO.MemoryStream($Unencoded, 0, ($Unencoded.Length-1))

# Create a ZlibStream
$ZlibStream = New-Object Ionic.Zlib.ZlibStream($MemStream, 'Decompress')
$Uncompressed = New-Object Byte(0)
$Byte = $ZlibStream.ReadByte()

# Read all bytes until the end of the stream is reached
while ($Byte -ne -1)
{
$Uncompressed += $Byte
$Byte = $ZlibStream.ReadByte()
}

# Write contents to disk
[IO.File]::WriteAllBytes('C:\Users\Test\Desktop\uncompressed.txt', $Uncompressed)

If you look at the uncompressed test, you’ll notice that a small portion of it is garbled. I attribute this to a bad base64 string in the example. Here’s the uncompressed data I got for your comparision:
[{"xmlns:xsi":"http://www.w3.org/2001/XMLSchema-instance","version":"1.0","Incident":{"AlternativeID":{"IncidentID":{"content":"https://zeustracker.abuse.ch/monitor.php?host=80.82.64.71&id=6e1ee1af9f98ff55a6d1a26599189647","restriction":"public"}},"Assessment":{"Impact":{"content":"botnet url","severity":"high"},"Confidence":{"content":"42.5","rating":"numeric"}},"purpose":"mitigation","EventData":{"Flow":{"System":{"Node":{"Address":"80.82.64.71"},"AdditionalData":[{"dtype":"string","content":"ripencc","meaning":"rir"},{"dtype":"string","content":"NL","meaning":"cc"},{"dtype":"string","content":"29073 ECATEL-AS AS29073, Ecatel Network","meaning":"asn"},{"dtype":"string","content":"80.82.64.0/24","meaning":"prefix"}],"Service":{"ip_protocol":"6","Portlist":"443"}}}},"IncidentID":{"content":"ea0f8485-7df2-5000-8376-d0321029ea68","name":"80d57daa-d5ec-3ccc-b718-11ecd06fea3a"},"DetectTime":"2012-01-24T00:00:00Z","Description":"zeus config","AdditionalData":{"dtype":"string","content":"8c864306-d21a-37b1-8705-746a786719bf","meaning":"guid"},"restriction":"need-to-know","RelatedActivity":{"IncidentID":"12d0687b-7116-576f-810e-929966e1aa96"}},"xsi:schemaLocation":"urn:ietf:params:xmls:schema:iodef-1.0"},{"xmlns:xsi":"http://www.w3.org/2001/XMLSchema-instance","version":"1.0","Incident":{"AlternativeID":{"IncidentID":{"content":"https://zeustracker.abuse.ch/monitor.php?host=91.213.8.56&id=2ff4cc4291a49542929f7d2a9bbcb416","restriction":"public"}},"Assessment":{"Impact":{"content":"botnet url","severity":"high"},"Confidence":{"content":"42.5","rating":"numeric"}},"purpose":"mitigation","EventData":{"Flow":{"System":{"Node":{"Address":"91.213.8.56"},"AdditionalData":[{"dtype":"string","content":"ripencc","meaning":"rir"},{"dtype":"string","content":"UA","meaning":"cc"},{"dtype":"string","content":"15626 ITLAS ITL Company","meaning":"asn"},{"dtype":"string","content":"91.213.8.0/24","meaning":"prefix"}],"Service":{"ip_protocol":"6","Portlist":"443"}}}},"IncidentID":{"content":"5232914b-8b5c-5902-b881-d0c3f61f1e5e","name":"80d57daa-d5ec-3ccc-b718-11ecd06fea3a"},"DetectTime":"2012-01-25T00:00:00Z","Description":"zeus config","AdditionalData":{"dtype":"string","content":"8c864306-d21a-37b1-8705-746a786719bf","meaning":"guid"},"restriction":"need-to-know","RelatedActivity":{"IncidentID":"d76dbdab-28e4-58d4-bff1-608da0b2f378"}},"xsi:schemaLocation":"urn:ietf:params:xmls:schema:iodef-1.0"},{"xmlns:xsi":"http://www.w3.org/2001/XMLSchema-instance","version":"1.0","Incident":{"AlternativeID":{"IncidentID":{"content":"https://zeustracker.abuse.ch/monitor.php?host=91.217.82.143&id=3a26b7abac67dae9dcec53ac26172cab","restriction":"public"}},"Assessment":{"Impact":{"content":"botnet url","severity":"high"},"Confidence":{"content":"42.5","rating":"numeric"}},"purpose":"mitigation","EventData":{"Flow":{"System":{"Node":{"Address":"91.217.82.143"},"AdditionalData":[{"dtype":"string","content":"ripencc","meaning":"rir"},{"dtype":"string","content":"RO","meaning":"cc"},{"dtype":"string","content":"13209 ATOM-HOSTING Atom Hosting SRL","meaning":"asn"},{"dtype":"string","content":"91.217.82.0/23","meaning":"prefix"}],"Service":{"ip_protocol":"6","Portlist":"443"}}}},"IncidentID":{"content":"2d8c29d9-6e3d-5e2f-a7d3-b45e9adc8978","name":"80d57daa-d5ec-3ccc-b718-11ecd06fea3a"},"DetectTime":"2012-01-25T00:00:00Z","Description":"zeus config","AdditionalData":{"dtype":"string","content":"8c864306-d21a-37b1-8705-746a786719bf","meaning":"guid"},"restriction":"need-to-know","RelatedActivity":{"IncidentID":"71738120-fd6e-5d78-aefe-c2d26a46cdb6"}},"xsi:schemaLocation":"urn:ietf:params:xmls:schema:iodef-1.0"},{"xmlns:xsi":"http://www.w3.org/2001/XMLSchema-instance","version":"1.0","Incident":{"AlternativeID":{"IncidentID":{"content":"https://zeustracker.abuse.ch/monitor.php?host=91.226.88.198&id=e8185274a8713afe3b2b48c3bd8ba914","restriction":"public"}},"Assessment":{"Impact":{"content":"botnet url","severity":"high"},"Confidence":{"content":"42.5","rating":"numeric"}},"purpose":"mitigation","EventData":{"Flow":{"System":{"Node":{"Address":"91.226.88.198"},"AdditionalData":[{"dtype":"string","content":"ripencc","meaning":"rir"},{"dtype":"string","content":"DE","meaning":"cc"},{"dtype":"string","content":"31342 DE-CLANOTOPIA-DUS-AS Clanotopia IT-Service Ltd.","meaning":"asn"},{"dtype":"string","content":"91.226.88.0/22","meaning":"prefix"}],"Service":{"ip_protocol":"6","Portlist":"443"}}}},"IncidentID":{"content":"edcbd401-4c08-53a3-a55b-9c40ac46c108","name":"80d57daa-d5ec-3ccc-b718-11ecd06fea3a"},"DetectTime":"2012-01-25T00:00:00Z","Description":"zeus config","AdditionalData":{"dtype":"string","content":"8c864306-d21a-37b1-8705-746a786719bf","meaning":"guid"},"restriction":"need-to-know","RelatedActivity":{"IncidentID":"6e7d4fbb-05f7-55a3-811c-90de6f9abe6a"}},"xsi:schemaLocation":"urn:ietf:params:xmls:schema:iodef-1.0"},{"xmlns:xsi":"http://www.w3.org/2001/XMLSchema-instance","version":"1.0","Incident":{"AlternativeID":{"IncidentID":{"content":"https://zeustracker.abuse.ch/monitor.php?host=174.129.231.95&id=612796e00e9a6ee927b6b51fbb578731","restriction":"public"}},"Assessment":{"Impact":{"content":"botnet url","severity":"high"},"Confidence":{"content":"42.5","rating":"numeric"}},"purpose":"mitigation","EventData":{"Flow":{"System":{"Node":{"Address":"174.129.231.95"},"AdditionalData":[{"dtype":"string","content":"arin","meaning":"rir"},{"dtype":"string","content":"US","meaning":"cc"},{"dtype":"string","content":"14618 AMAZON-AES Amazon.com, Inc.","meaning":"asn"},{"dtype":"string","content":"174.129.0.0/16","meaning":"prefix"}],"Service":{"ip_protocol":"6","Portlist":"4447"}}}},"IncidentID":{"content":"41674e7e-06a5-5930-979e-6f9388645070","name":"80d57daa-d5ec-3ccc-b718-11ecd06fea3a"},"DetectTime":"2012-01-25T00:00:00Z","Description":"zeus config","AdditionalData":{"dtype":"string","content":"8c864306-d21a-37b1-8705-746a786719bf","meaning":"guid"},"restriction":"need-to-know","RelatedActivity":{"IncidentID":"90a9b5d5-5d7b-584e-bd4e-91818ba81d41"}},"xsi:schemaLocation":"urn:ietf:params:xmls:schema:iodef-1.0"},{"xmlns:xsi":"http://www.w3.org/2001/XMLSchema-instance","version":"1.0","Incident":{"AlternativeID":{"IncidentID":{"content":"https://zeustracker.abuse.ch/monitor.php?host=178.208.77.81edb71eentedt":Ld.1b8a"http://www.w3.org/2001/XMLSchema-instance","version":"1.0","Incident":{"AlternativeID":{"IncidentID":{"content":"https://zeustracker.abuse.ch/monitor.php?host=174.129.231.95&id=612796e00e9a6ee927b6b5,"coS8eenni.-o572Aonito/www63n3c17d=61ea0 cker.abuse.ch/monitor.php?host=174.129.231.95&id=612796e00e9a6ee927b6b51fbb578731","restriction":"public"}},"Assessment":{"Impact":{"content":"botnet url","severity":"high"},"Confidence":{"content":"42.5","rating":"numeric"}},"purpose":"mitigation","EventDat4SAf9"-e,"contew5E.Sl2anbm6Z87A}eshost=1TIet95vNf7eneff73ntact"7ic t4dL"De8e7dOrntent":"174.ds7A}5p1e5L4cdit"Incn":r8c 9a7f4e.m3rA}eciec .-3c.o0c–.a5411t":"443"A}5p1e5L4cdit"Incn":r8c 9a7f4e.m3rA}eciec .-3c.o03ef9"5:e0175s3h"f4cdit"Incn":r8c 9a7f4e.m3rA}eciec .-3c.o03ef9"5:e0175s3h"f4cdit"Incn":r8c 9a7f4e.m3rA}eciec .-3c.o03ef9"5:e0175s3h"f4cdit"Incn":r8c 9a7f4e.m3rA}eciec .-3c.o03ef9"5:e0175s3h"f4cdit"Incn":r8c 9a7f4e.m3rA}eciec .-3c.o03ef9"5:e0175s3h"f4cdit"Incn":r8c du552:"high"},"Confidence":{"content":"42.5","rating":"numeric"}},"purpose":"mitigation","EventData":{"Flow":{"System":{"Node":{"Address":"80.82.64.71"},"AdditionalData":[{"dtype":"string","content":"ripencc","meaning":"rir"},{"dtype":"ebmOfcMCO"A 2onaf{"xmlnsb22em":{"Node":{"Address":"80.82.64.71"},"Additiona0rbentData"Lar{d-da,vEfr8-eCt-":{ned7552:-9ecciec[27SntDAAZ0ey5k6s3afdresi.,0-9n1decebf b"tZ0e199be-":"strinams:xmls:schema:iodef-1.0"},{"xmlns:xsi":"http://www.w3.org/2001/XMLSchema-instance","version":"1.0","Incident":{"AlternativeID":{"IncidentID":{"content":"https://zeustracker.abuse.ch/monitor.php?host=9":{"IncidentID":"6e7d4fbb-05f7-55a3-811c-90de6f9abeA
Looks like the good 'ol Zeus botnet has been wreaking havoc again. ;D

Hope this helps!
by MattG at 2012-11-04 07:11:11
Update: In the example, a newline escape ‘\n’ is inserted into the base64 string after every 76th character with the exception of one line in the middle which has 78 characters. Weird. Again, I attribute improper encoding to a bad sample.
by davidski at 2012-11-04 14:18:21
Thanks for all the research on this, Matt!

I’m wondering how you determined the Base64 operation was successful? When I drop the UTF8 decode step, the Base64 decoding seems to work fine, but the loop reading from the Zlibstream runs and runs (-1 is never returned). Letting the process spin for a long time before terminating gives a partial response in $uncompressed.

Very strange.

David
by MattG at 2012-11-04 14:33:34
My pleasure.

The only basis for success I was using in the base64 decode step is that it didn’t return any errors.

Other than that, I couldn’t tell you why it’s not returning the full, decompressed contents though. Again, my suspicion is that the compressed data and base64 encoding scheme is being done in a non-standard way. The inconsistent newlines in the base64 string is a perfect example.