Encrypted Images Vulnerable To New Attack 155
rifles only writes "A German techie has found a remarkably simple way to discern some of the content of encrypted volumes containing images. The encrypted images don't reveal themselves totally, but in many cases do let an attacker see the outline of a high-contrast image. The attack works regardless of the encryption algorithm used (the widely-used AES for instance), and affects all utilities that use single symmetric keys. More significant to police around the world struggling with criminal and terrorist use of encryption, the attack also breaks the ability of users to 'hide' separate encrypted volumes inside already encrypted volumes, whose existence can now for the first time be revealed." The discoverer of this attack works for a company making full-disk encryption software; their product, TurboCrypt, has already been enhanced to defeat the attack. Other on-the-fly encryption products will probably be similarly enhanced, as the discoverer asserts: "To our knowledge is the described method free of patents and the author can confirm that he hasn't applied for protection."
Only works on uncompressed bitmaps (Score:5, Informative)
Based on comparison of two versions of the volume (Score:2, Informative)
It was always risky to have two versions of a block-structured file encrypted with the same key. You can see the changes. That may tell an attacker things about the encrypted data (filesystem, size of files, etc.) If you backup encrypted volumes, either put them in another container or decrypt the files and store them in another container with a different key. Never keep different volumes with the same key. The attack is of significance for law enforcement, which may be able to enter the premises of a suspect and get copies of an encrypted volume at two different points in time. (A hint for Truecrypt users: Changing the passphrase does not change the key.)
It seems (Score:5, Informative)
The summary title and summary write up are a little ambiguous.
Re:Watermark? (Score:5, Informative)
The cause is similar to the watermarking attack, but the idea is used backwards. The watermarking attack reveals the presence of maliciously constructed decoy plaintext encrypted by the user, whereas this attack reveals information about the change in an unknown plaintext.
In both attacks the issue is that the salt, as you call it, is constant for a given disk block. If that salt can be predicted, a decoy plaintext can be revealed in the ciphertext. If data is changed while using the same salt, sections of identical plaintext before and after the change can be identified.
Old news for TrueCrypt (Score:5, Informative)
Not only is the sensationalist article/summary only pertinent to uncompressed bitmaps, TrueCrypt has warned their users about backing up hidden volumes for a long time (source [truecrypt.org]). In fact, it's the first precaution in how to keep your hidden volume secure.
So people worrying about steganography on TrueCrypt volumes shouldn't, they've been telling you how to keep these volumes secure already.
Re:Compressed images (Score:4, Informative)
Run-length encoding doesn't give you all that high entropy, and neither JPG nor PNG uses it.
Re:What about crypto modes? Never heard of CBC, CT (Score:5, Informative)
You don't use any of those modes on disk images, because you need fast random access.
http://en.wikipedia.org/wiki/Disk_encryption_theory [wikipedia.org]
Re:Watermark? (Score:2, Informative)
The problem is many CBC and other disk-encryption modes used an IV based on the disk sector number. So when that sector changes, the changes continue to be encrypted with the same IV and key.
Re:Watermark? (Score:5, Informative)
By breaking the disk up into sectors, each of which start a new chain. The problem is an IV is used to "start" the CBC chain, and this IV is static as the underlying plaintext changes. So new changes on the same point of the HD get encrypted with the same IV.
Re:What about crypto modes? Never heard of CBC, CT (Score:3, Informative)
It doesn't? What about the part in TFA that reads:
Re:Compressed images (Score:3, Informative)
LZ77 is not run-length encoding. Run-length encoding only encodes repeated sequences of same letters, while LZ77 encodes repeated sequences as pointers to the previous instance of the same sequence.
Re:Watermark? (Score:5, Informative)
It actually makes me happy to see that some people are starting to get the point. I have been pointing out these weaknesses for years.
Some of them are actually even worse. If the IV is just the sector number, then the difference between two neighbor sectors is known, and you can construct a file that will cancel out that difference and the two sectors get the same cipher text. I constructed a file [kasperd.net] some years ago, that demonstrated the problem. At that time Truecrypt was vulnerable to this attack. Truecrypt did apply some whitening after the encryption, but that didn't really make the pattern much worse. Put the file I mentioned on a Truecrypt volume encrypted in CBC mode, and somewhere in the encrypted image there will be two neighbor sectors that can be XORed together and will cancel out all the data leaving only the whitening pattern, which is easily recognizable because it repeats over many times through the sector.
Encrypting the IV is better, but still vulnerable to the problem you describe. In fact the problem you describe applies one way or another to almost every single disk encryption in existence. All the encryptions need some nonce or randomness, and since it doesn't fit in the sector, they cut a corner and use the sector number, which doesn't change when the sector is overwritten. (I have seen one that used extra space by mapping 32 logical sectors to 33 physical sectors, but that encryption had other problems including a weak pseudo random number generator, and potential data loss caused by the need to update two sectors which isn't done atomically).
Recent Truecrypt versions are no longer vulnerable to the attack I described above. They now use tweakable block ciphers. But just like CBC needs a unique IV for each time you encrypt, tweakable block ciphers need a unique tweak. Truecrypt use the sector number for tweak, so if a sector is overwritten, you have the same problem again. In fact it is even worse because there is no longer any chaining, just a tweak for each 16 byte block, which means changing a byte in a sector would keep changes in the cipher text within the 16 byte block. I didn't verify this in practice, I just read the specification. I mentioned this problem to the authors a long time ago, but they didn't consider it a problem.
IV problem and flash disk (Score:4, Informative)
The problem is the IV for CBC never changes for a given sector - mainly because there is no provision to atomically write both a 512 byte sector and its 48+ bit IV. I *have* read about a disk designed for full disk encryption which provides 520 byte sectors instead of 512 byte sectors. That completely solves the problem.
Some disk encryption uses non-atomic sector writes (store IVs in a separate physical sector). This risks data loss should one get updated but not the other.
I will note that the problem is more easily completely solved for flash media - where it is easier to (atomically) tag sectors with additional data.
RLE in images (Score:5, Informative)
Plain JPEG uses runlength and huffman encoding [wikipedia.org] on the quantized matrix obtained after DCT.
This is relatively efficient because there are a lot of long string of repeats in the matrix (see Wikipedia article).
(This is where some applications such as Stuffit have been able to non-destructively increase the compression of JPEGs and gain a couple of 1% by using better algorithms to store the quantized results)
In lossless mode, JPEG uses instead Arithmetic coding on the raw non-quantized results of DCT.
PNG uses LZ77 (either on the raw pixels or on a delta) wich is an entirely different beast. As pointed out by other /.ers it's a *dictionary* compression which replace repeat parts with pointer to where to they where repeated first :
"ABACDABA" becomes "ABACD{go back 5 letters and copy 3 letters}"
It doesn't need a separate RLE mode, because the dictionary can be abused as follow :
"ACCCCCC" in RLE is "A{repeat 6xC}" and in LZ77 is "AC{go back 1 and copy 5 letter}"
Re:Only works on uncompressed bitmaps (Score:5, Informative)
The article uses images encrypted with in ECB mode (a well-known insecurity) as a visual analogy to the backup-file problem.
The backup-file problem is that when you have two volumes encrypted with the same key (not the same password, the same internal encryption key), the difference between those two volumes can reveal some information about the encrypted data. Perhaps all you can determine is what parts of the volume have changed, but that's more than nothing, and therefore unacceptable.
The is a "backup-file" problem because you NEVER have two volumes encrypted with the same internal key unless one starts out as a "backup" copy of the other.
The product mentioned in the article "fixes" this problem by providing an explicit "backup" function. This function creates a new volume containing the same data as the original, but which is encrypted using a different internal key. The hope is that because such an option exists, users will be dissuaded from simply storing bit-for-bit backups of their encrypted volume.
Nothing about this is ground-breaking or even novel, but the concepts at play are important for consumers of encryption products, so the attention is worthwhile.
This is not really a "ciphertext only" attack (Score:3, Informative)
This attack will only effect uncompressed images, because compression increases the entropy so that pixel information will be entirely different as the colors change.
Finding a pair of encrypted twin images like this is nearly impossible. How can they even tell where an image starts on disk when the filesystem itself is encrypted? Not to mention the very strict (and unlikely) requirements on the images themselves. The odds of actually being able to exploit this on a live system are very low IMHO.
Re:Watermark? (Score:3, Informative)
I have seen one that used extra space by mapping 32 logical sectors to 33 physical sectors, but that encryption had other problems including a weak pseudo random number generator, and potential data loss caused by the need to update two sectors which isn't done atomically
That would be GBDE [wikipedia.org]. Shouldn't it be relatively easy to replace the PRNG?
Re:Watermark? (Score:5, Informative)
Correct.
Depends on whether you are worried about breaking compatibility. There is nothing you can do for existing encrypted volumes. If you want to improve security for your existing volumes, a complete reencryption of the volume is needed. If you want to protect new volumes while remaining compatible with the existing implementation, a small change to the way the master key is generated would help. There is a 256 byte lookup table, in which having two identical bytes is a weakness. The table is generated randomly, which means master keys vary in quality. The best master keys are those were all 256 bytes are different, but the chance of that happening at random is negligible. If the key was instead generated by taking an array containing all the values from 0 to 255 and permuting those randomly, you would always get keys that are resistant to the known attack.
If you don't need compatibility with the existing code, you can do even better. The PRNG makes use of MD5. Before anybody starts talking about MD5 being broken, keep in mind that the known attacks against MD5 do not apply to the way GBDE use it. The input that GBDE passes to MD5 is 24 bytes, which have 128 bits of entropy (with the best keys). Obviously the output will have no more than 128 bits of entropy, but it could have less. Though the input is just 24 bytes, MD5 is going to add a length field and pad the result to a multiple of 64 bytes. So you could actually double or triple the key size and pay no extra cost in performance on the MD5 operation.
Since three byte quantities are difficult to work with, I'd only suggest to change the lookup table from having 256 8-bit values to having 256 16-bit values. The risk of having two identical values if you pick them randomly, is reduced significantly. But still generating it in a way that guarantees it is still better. This would produce a 40 byte input to MD5. If you wanted to make use of the full MD5 block size, you could append another random key to the 40 bytes such that you would make the input as large as what would fit in a single MD5 block. Doing all of this would increase the key size from 272 bytes (16+256) to 584 bytes (56+16+512), and would not spend any additional time on the cryptographic operations in the critical paths.
You could also ditch the PRNG altogether and use a standard PRNG. But that would mean a significant performance hit, so you'd have to reduce the size of the master key. So I am not sure that would really make it any stronger than fixing the known vulnerabilities in the current PRNG.
As for the potential data loss goes, I suppose you could live with it and just make sure to have a good backup strategy. When doing that of course you have to keep in mind that you shouldn't make backups by copying the encrypted container. The correct way would be to copy the files from the encrypted container to a different encrypted container. Or you could copy the files to an encrypted archive that does not support random access, like tar + gpg. Or you could back up your container by creating an encrypted copy of the container. If you use asymmetric keys, you could create a gpg encrypted copy of the encrypted container even while it is not mounted.
Problem is that the only way to detect that corruption has happened is by trying to open the files with applications understanding the format, and see them barf. However fixing the potential corruption is tricky. If I find a good solution, I am probably going to write my own storage encryption layer some day.