October 07, 2013

Descriptive Entropy 4 fun.

What I'm going to describe here it's an experiment I had in mind for like 4 months, and finally I had the occasion(=time+desire to do it) only today.

All has started with this [1] article. In the first pages, the authors introduces the concept of Descriptive Entropy. 
For those of you who don't know what entropy is, entropy is just like a meause of quantity of information in a text. 
Descriptive Entropy is more or less the same, but it instead of counting the frequency of the symbol in a text, the algorithm count all the occurrence of all the substring.
We can say that the Classical Entropy is just a particular case of Descriptive Entropy (e.g. we stop to count the occourence of all the substring of lenght 1)

Another thing you have to know: a good cipher should implement the "confusion and diffusion" principle.
[2]. What i want to measure was a concept quite close to the "diffusion", but it's not the same.

Long story short, I just wanted to see if there's a correlation between desc.ent. in the plaintext and the desc.entr. in the ciphertext.  For this work I wrote this: [3].   (warning: I'm not a coder)
I've used AES in CBC with random IV.

As you can see from the picture (I've not used math methods to check correlation, just this):

There's no direct correlation between desc.entr. of the plain text and desc.entr of the cipher text. :(

Further works:
- study more cipher, maybe the weakest, just for the lulz (what about stream ciphers?)
- use a better way to generate text with increasing descriptive entropy
- beer?

[1] http://www.phrack.org/issues.html?issue=68&id=15#article
[2] http://en.wikipedia.org/wiki/Confusion_and_diffusion
[3] https://github.com/Scinawa/descEntropy/blob/master/AESvsEntroDesc.py