Probabilistic Considerations for Computation of Shannon Entropy in Network Traffic
I have a dump file (CAP format) of a network traffic capture made with Debian tcpdump. Until a certain time, it is attack-free traffic. Then a series of TCP SYN flooding attacks begin. My goal is to calculate the entropy of each of the traffic moments (with and without attacks) and to compare them.
I’m using the Python code:
import numpy as np import collections sample_ips = [ "131.084.001.031", "131.084.001.031", "131.284.001.031", "131.284.001.031", "131.284.001.000", ] C = collections.Counter(sample_ips) counts = np.array(list(C.values()),dtype=float) #counts = np.array(C.values(),dtype=float) prob = counts/counts.sum() shannon_entropy = (-prob*np.log2(prob)).sum() print (shannon_entropy)
When calculating this way, some doubts arise:
How to justify this? I do not know how the distribution is …
2.How to validate the experiment? I am thinking of a hypothesis test with the following null hypothesis:
“The entropy value allows you to detect the attack” Are you coherent? What would be a good hypothesis test for the case (the sample space is about 40)
1) If you can come to the same conclusion for probability distributions in samplings with different time intervals on different days your answer is yes.
2) an experiment must be elaborated / thought on how it will be done, then it should be described on paper at each step of it, without missing any steps, it should be possible to be redone by another person who doubted the data obtained. After writing must be done, again and again with different data, and everyone should come to the same conclusion.
You can read more about Scientific Methodology , will help you.