# Probabilistic Considerations on the Calculation of Shannon’s Entropy in a Network Traffic

Posted on

#### Question :

Probabilistic Considerations for Computation of Shannon Entropy in Network Traffic

I have a dump file (CAP format) of a network traffic capture made with Debian tcpdump. Until a certain time, it is attack-free traffic. Then a series of TCP SYN flooding attacks begin. My goal is to calculate the entropy of each of the traffic moments (with and without attacks) and to compare them.

I’m using the Python code:

``````import numpy as np
import collections

sample_ips = [
"131.084.001.031",
"131.084.001.031",
"131.284.001.031",
"131.284.001.031",
"131.284.001.000",
]

C = collections.Counter(sample_ips)
counts = np.array(list(C.values()),dtype=float)
#counts  = np.array(C.values(),dtype=float)
prob    = counts/counts.sum()
shannon_entropy = (-prob*np.log2(prob)).sum()
print (shannon_entropy)
``````

When calculating this way, some doubts arise:

• I am considering a discrete probability distribution with equiprovável sample space. Is this reasonable?
How to justify this? I do not know how the distribution is …
• 2.How to validate the experiment? I am thinking of a hypothesis test with the following null hypothesis:
“The entropy value allows you to detect the attack” Are you coherent? What would be a good hypothesis test for the case (the sample space is about 40)