Details
This section of exercises allows you to explore the use of SiLK with a NetFlow repository rather than using files generated from packet capture files. Using SiLK with packet captures is very useful during an incident response if a NetFlow repository isn’t available, but during normal day-to-day operations, you would typically use SiLK with a repository.
In our collective experience, even though NetFlow is generally already supported by the switches, routers, and other network devices that enterprises have installed, it is rare to find that an enterprise has a NetFlow repository configured unless they have a fairly knowledgeable network engineering staff. It is even more rare to find that it is being used for any type of security analytics or to identify potential indicators of compromise. A NetFlow repository, therefore, is one of the easiest and least expensive changes that can be made to a network infrastructure that will immediately provide greater insight into how the network is used and assist to identify anomalous behavior.
Exercise 1
SiLK Repository
When using SiLK with a repository, you have the ability to retrieve results covering long periods of time, from specific sensors, and more. SiLK relies on configuration files in the /data directory to determine what the names of the sensors are, how data will be collected, etc., in addition to which fields are displayed by default when using rwcut. There is absolutely no need for you to make any changes or directly work with the files in this directory, but you are welcome to explore.
1. Please query the repository stored on the class VM to determine the total number of flows seen between October 1, 2018, and October 15, 2018. How many flows are there?
I am going to use the rwfilter tool. Using this tool I can specify partitioning criteria, such as the type of data to retrieve, sensors of interest, and time ranges of interest. I can also specify query criteria, such as the protocols of interest etc…
On top of it, I can leverage options like --print-statistics
that will give me the number of flows :

There are 4143675 flows in this time range.
2. Please query the repository for flows occurring between October 1, 2018, and October 15, 2018. How many TCP flows were logged?
I can use the same command as above to answer. All I have to modify is the –proto option as TCP is the protocol number 6:

There are 2997358 TCP flows logged.
Exercise 2
- Please query the repository to find all of the hosts that are seen establishing a connection to destination port 60000 between October 1, 2018, and October 31, 2018. How many unique source hosts are seen?
Since I am only interested in host seen establishing a connection, I must select all of the flows that begin with a SYN. I can use the rwfilter option called –flags-initial

There are 3 lines that are no part of the listed flows. Therefore, using wc -l, I can quickly figure out how many unique source hosts are seen:

We have 48 unique source hosts.
Exercise 3
Let’s switch to the repository data that does not have all of the flags data present. While it isn’t convenient to work with this data, it is not unusual to have sensors that will not properly populate these fields. This makes it important to have familiarity with working with this type of data.
The data of interest covers dates from February 8, 2022 through July 3, 2022.
- Please query this new repository and identify all of the flows where only SYN, with or without ECN bits, was present. Examine the first 20 flows that are displayed, especially noting the source, destination, ports, number of packets, and flags fields.

Notice the behavior of the source hosts and source ports, in addition to the number of packets seen. How would you characterize this? Do these appear to be “real” connection attempts, or some type of spoofed scanning behavior?
I observed that the source IP address 172.28.30.4 appears to be initiating connections to six different destination hosts—first targeting port 9573, then port 10001. The presence of packets with only the SYN flag set strongly suggests these are the initial steps of a TCP three-way handshake. Additionally, I noticed that the source port changes with each destination, which indicates the packets are likely not spoofed but instead generated by a legitimate IP stack initiating connections.
Each flow consists of two or three packets, which is important. If there were only one packet per flow, it might suggest scanning behavior. However, two to three packets usually point to actual connection attempts, possibly with some retries involved. Taking all of this into account, the evidence supports that 172.28.30.4 is most likely making genuine TCP connection attempts rather than performing a spoofed scan.
2. Extract all of the records from this data that involve hosts 172.28.30.5 and host 192.225.158.2 and examine the sip, sport, dip, dport, flags, and packets fields.

Using the ‘–any-address’ option coupled with chaining two rwfilter command together allow me to extract the records involving just these two hosts.
3. Examine all of the flows between 192.225.158.2 and 172.28.30.5 and explain why two of the flows have no flags set.
If I add the protocol field to my rwcut command, I can see that the two flows with the SYN flag are TCP flows (protocol 6) whereas the flows without flags are UDP flows (protocol 17):
