




For my cybersecurity class project, I decided to set up a honeypot on a Raspberry Pi 5. This document chronicles my entire journey – the successes, the mistakes, and everything I learned along the way. Honestly, it was more challenging than I expected, but also really rewarding when everything finally worked!
What I Used for This Project
Hardware: Raspberry Pi 5 with 8GB RAM (borrowed from the lab)
OS: Raspberry Pi OS Lite (64-bit) – went with the lite version to save resources
Storage: 32GB Class 10 MicroSD Card
Network: My home network with an AT&T BGW320-500 router
Setting Up the Raspberry Pi
I followed this YouTube video: “CanaKit Raspberry Pi 5 8GB Starter Kit [Turbine] – Setup Guide” because I had never set up a Pi before and wanted to make sure I didn’t break anything.
Preparing the SD Card
Since I needed a completely fresh setup for this project, I started from scratch:
Downloaded the Raspberry Pi Imager from the official website
Installed it on my laptop and got everything ready
Put the MicroSD card into the USB reader and connected it
Used the imager to set everything up:
Selected “Raspberry Pi 5” as my device
Chose “Raspberry Pi OS Lite (64-bit)” since I didn’t need the desktop
Selected my SD card for storage
Hit “Write” and waited. it took like 15 minutes.
I finally ejected the card when it was done.
Installing the Heat Sinks
This part made me nervous because I’d never done hardware stuff like this before.
What I learned: Take your time with this step! The adhesive is really strong and you only get one shot.
I cleaned the main CPU chip with isopropyl alcohol (used a cotton swab)
I carefully peeled off the adhesive backing from the heat sinks
I placed the biggest heat sink on the main chip – held my breath the whole time
Pressed down firmly for about 15 seconds
Putting Together the Case
This was actually the easiest part. The case design is really well thought out.
I put the Pi board in the bottom piece, making sure everything lined up
I connected the cooling fan to the fan header on the Pi.
I positioned the fan in the top part of the case
I snapped everything together. No screws were needed.
SD Card Installation
My mistake: I put the SD card in upside down the first time and was trying to force it.
The SD card slot is on the bottom of the Pi. I flipped it over, made sure the label was facing up, and gently pushed it in until it clicked.
First Boot
I connected my keyboard and my monitor, plugged in the power supply, and connected everything to my router with an Ethernet cable. When I powered it on, the boot screen came up pretty quickly – faster than I expected.
The setup wizard was straightforward. I picked my country, created my username and password, and skipped the WiFi setup since I was using Ethernet. Then I ran the update commands:
sudo apt update
sudo apt -uy dist-upgrade
This took about 10 minutes on my connection. After that, I rebooted with sudo reboot and was ready for the honeypot installation.
Installing the Honeypot
For this part, I followed Dr. Ulrich’s YouTube video starting from the “First Connect to Pi” section. This was where things got really interesting (and challenging).
Getting the System Ready
First, I needed to make sure I could use the whole SD card and had all the tools I needed:
bashsudo raspi-config –expand-rootfs
Then I realized Git wasn’t installed by default, so I had to add it:
bashsudo apt -y install git
I created a directory called “Install” and went into it to start the real work.
Setting Up DShield
This is where the actual honeypot magic happens:
Cloned the DShield repository:
bashgit clone https://github.com/DShield-ISC/dshield.git
Ran the installation script:
bashcd dshield/bin
sudo ./install.sh
Went through a bunch of dialog boxes – I just followed the video recommendations
Cowrie got installed automatically.
Connecting to ISC
I had to create an account on the ISC website to get an API key. Once I had that, I used my email and the key to authenticate my honeypot to their system. Then I went through checking all the configuration parameters to make sure everything was set up correctly.
I was then instructed to run a status command to make sure that everything was working properly and this is when I ran into a couple of problems:
Problem #1: ISC-Server Wouldn’t Start
When I tested the honeypot status, the isc-server showed as “not running” and I had no idea why.
I was pretty frustrated at this point, but I found Guy Bruneau’s GitHub troubleshooting guide which saved me. The issue was that a log file was missing:
bashsudo touch /var/log/dshield.log
sudo chown syslog:adm /var/log/dshield.log
Then I checked if the service was running:
bashsudo systemctl status isc-agent
It still wasn’t working, so I manually started it:
bashsudo systemctl start isc-agent
This finally got the service running properly.
Problem #2: Nobody Could See My Honeypot
The honeypot was running, but it wasn’t exposed to the internet so no one could find it.
The people on the Slack channel told me I needed to set up port forwarding on my router. This was totally new to me, but I figured it out using the help of Claude:
I logged into my AT&T router’s web interface
I found the “NAT Gaming” section (took me a while to find this)
I set up port forwarding rules to redirect these ports to port 8000:
Port 80 (for web traffic)
Port 8080 (alternative web port)
Port 7547 (for CWMP)
Port 5555 (for personal agent)
Port 9000 (for SonarQube)
I applied all the changes and crossed my fingers
I waited for a couple of hours for everything to start working properly. This was the hardest part – just waiting and hoping I did it right.
After the waiting period, all the logs started populating and everything was working perfectly.
The SEC450 CTF network consisted of a simulated mixed Windows Active Directory and Linux server environment. There were 3 subnets with machines:
10.0.1.0/24 – Internal Servers (Active Directory Domain Controller, File share server)
10.0.2.0/24 – User devices (5 User laptops)
10.0.3.0/24 – DMZ (one Linux web server)
DNS Concepts 1

If the IP address is an IPv4 address, then this is an A query.
DNS Concepts 2

For a PTR record lookup of 8.8.4.4, the hostname used in the query would be:
4.4.8.8.in-addr.arpa.
DNS Concepts 3

That’s an SRV record (Service Record) query where _sip is the symbolic name of the service, tcp is the name of the transport protocol and mycompany.com is the domain name.
DNS Concepts 4

We can use the below command:

we find the below answers:

DNS Concepts 5

To find SPF records for admin@mail.sec450.com, I need to query the TXT records for the domain part of the email address (mail.sec450.com):

DNS Logs 1

We can filter the Bro-DNS dashboard for A record request only:

DNS Logs 2

Using the sam dashboard as above, we can easily find the client within the SEC450 domain that was the source of the highest count of DNS request:

DNS Logs 3

We filter out DNS query types that are not A or CNAME queries and we also filter out the DC IP address:

We find the following list of external DNS servers that were queried:

DNS Logs 4

IDN domains use Punycode encoding – they start with xn-- when encoded. Filtering for queries starting with xn--, we find the following query:

HTTP Interpretation 1

The response code from the server is 200 OK which is a successful request.
HTTP Interpretation 2

The answer can be found in the User-Agent section of the Request.

The User-Agent string shows “Firefox/101.0” which indicates Firefox version 101.0. The rest of the string provides additional system information – it’s running on Windows NT 10.0 (Windows 10) on a 64-bit architecture, with the Gecko rendering engine version 67.0.
HTTP Interpretation 3


The server section displays the webserver software that is used to provide the response and the version number:

The software and the version number are not shared and it is not a common thing. It usually says something like Apache or nginx and a version number.
HTTP Logs 1.1

Using the Bro-HTTP dashboard in Opensearch, we can quickly filter for all the source IP addresses in the sec450.com domain. We can then look for the common User-Agent used by these addresses:

HTTP Logs 1.2

The CIDR for the DMZ subnet is 10.0.3.0/24
I can search for it in the Bro-HTTP dashboard:

I can then filter for it in the Destination IP address part of this address:

We can filter for POST requests only:

We then the result we were looking for:

HTTP Logs 1.3

Web scanning often creates a high volume of requests.


The IP address doing the scanning is 192.165.1.156 and it is using a tool called Nikto which is a popular open-source tool used for web server scanning to identify potential security vulnerabilities, misconfigurations, and dangerous files/programs.
HTTP Logs 1.4

OpenSearch has a NIDS dashboard that may be useful. We can filter for the source IP address that was scanning the DMZ server in the last question:

We see that there are 86,312 alerts tied to this address.
The name of the most common alert is:

HTTP Logs 1.5

A brute force attack typically involves automated attempts to guess credentials by systematically trying different username/password combinations. We expect to see hundreds or thousands of requests to login endpoints in a short timeframe, requests coming from the same IP address (or a few IPs), different username/password combinations being attempted and a high frequency of HTTP 401/403 responses (authentication failures).
The primary HTTP method used for login attempt is the POST method.
Based on these general observations, we are going to use our Bro-HTTP dashboard and filter for POST request that generated an “Unauthorized” status message:

The source IP address generating these messages is:

HTTP Logs 1.6

This IP address never generated an “Authorized” status message from the webserver so we can safely assume that he never guesses his way into the site:

HTTP Phishing 1

Using the Visualization Tab and using the NIDS-Alert Summary visualization , we quickly find the below alert:

HTTP Phishing 2

To find the hostname, I used a DHCP visualization where I was able to map the source IP to a hostname (LPT05)

To map this hostname to a username, I used the Sysmon-logs dashboard, where I sorted by hostname and looked for the most frequent username associated with the hostname (link):

HTTP Phishing 3

We can use the Bro-HTTP dashboard and filter for POST method that resulted in a successful connexion:

There is only one connexion associated with this filter, and we can easily retrieve the domain name and URI for it in the dashboard:


HTTP Phishing 4

Using Wireshark, we can filter for the POST request:

Looking in the HTML section of this packet, we can find the username and password used to authenticate:

HTTP Phishing 5

Looking at the first GET request of this HTTP session, we find the lookalike domain that led to the phishing site:

HTTP Phishing 6

Based on the question, I am looking for DNS queries that occur between the initial page load (ducussign.com) and the POST submission. More specifically I am looking for common redirect services: bit.ly, tinyurl.com, rebrandly.com, goo.gl, t.co, ow.ly, etc.
In Wireshark, I used a filter for DNS A records and PTR records that might reveal service domains. I looked specifically at the packets between the initial page load and the POST request:

I find a DNS request for bit.ly pretty quickly.
HTTP Phishing 7

I used a filter in Wireshark that searches through HTTP traffic for specific text (bit.ly):

Looking at the packet content, I quickly find the link:

HTTP Phishing 8

We already know that the IP address for the phishing site is 199.192.19.138. We use it as filter in NIDS dashboard:

3 IDS alerts are associated with this address:

Looking at the logs, we can find the SID for each one of these alerts:



TLS 1.3

I used the SSL dashboard to solve this question and use the TLSv1.3 filter:

Looking at the logs, the organization these SSL connections are all associated with is google:

Let’s Encrypt!

We used the X.509 – Certificate Subject dashboard and filtered for the Let’s Encrypt service.
We see six unique domain names that were visited from the sec450.com network:

IDS meets encryption

We are looking for encrypted traffic to a web-based service therefore it makes sense to filter our NIDS dashboard using port 443:

There are only 3 total alerts generated using this filter and there are all under the same alert name:

How much data?

Using the Bro-Connections dashboard, we filter it using the source ip address found in the last question:

We see 4 connections and looking at the detailed logs for each one of them we can find the total number of bytes that was transferred:


We find a total of 28KB which is not number that would be pointing to the exfiltration of a large DB.
Email Analysis 1.1

Looking at the content of this email:

We find that the email Title is:

Email Analysis 1.2

To find this IP address, we have to look at the headers and remember they are listed in reverse chronological order from newest to oldest. We see that the address that passed this email to the gmail infrastructure is:

Email Analysis 1.3

I mentioned in the last question that email headers are in reverse chronological order, therefore the solution to this question is located with the first items at the bottom of these “Received: From” headers.

Email Analysis 1.4

We can easily find this information in the “From” header:

Email Analysis 1.5

To answer this question, we have to look for the Return-Path header:

Email Analysis 1.6

Looking at the headers, we can see that both SPF and DKIM passed but there is no line explicitly showing that the DMARC check passed:

Email Analysis 1.7

Nothing appear to be malicious with this email from what we have seen so far with our analysis from question 1 to 7.
Email Analysis 2.1

We are moving on to analyse a different email:


Email Analysis 2.2


Email Analysis 2.3


Email Analysis 2.4

The SPF check failed:

Both the DKIM and the DMARC checks are not shown in the headers.
Email Analysis 2.5

The SPF checks failed therefore this email does appear to have been spoofed. The domain of info@mail.com does not designate the ip address 170.210.54.131 as a permitted therefore we can safely assign that this email address was spoofed.
Email Analysis 2.6

Starting from the bottom of the headers, we find 3 hostname with DNS entries that the email has passed through:


Email Analysis 2.7

Similar to the last question, we start at the bottom and write down all the IP addresses the mail passed through ignoring the X-received line:
10.7.155.185, 129.205.112.156, 172.20.4.2, 170.210.54.131, 2002:a6b:5a15:0:0:0:0:0
Email Analysis 2.8

I went to VirusTotal to solve this question. The country is Nigeria.

Rogue Device

Which logs would report the hostnames seen on the network? I used the DHCP dashboard as this would definitely reports all the hostnames and associated IP addresses seen on the network. I then did a long-tail analysis of all these hostnames and one of them stood out as it only had two log counts which was odd and worth investigating:

Looking at the DHCP process for this address, we can see that it looks very odd compared to how the other addresses on the network interacted with the DHCP server.
We see two connections made to the server over a 2 minutes span:

The second one is the odd one, as the Offer and the Ack steps are usually steps emanating from the DHCP server itself and not from a system trying to get an IP address assigned. Looking at the logs, we find this system hostname and IP address:

SSH Outbound

Looking at the SSH dashboard, we quickly find an SSH connection made to port 2222:

Filtering for this one connection, we quickly find the source and destination IP addresses:

An Attempt Was Made…

We know that the rogue device IP address is 10.0.2.18. Looking at our Connections dashboard, we are going to filter for this specific address. We then filter the dashboard for connection made to port 445. We see a total of 8 connections all made to the same destination IP address over port 445:

This dashboard does not indicate the hostname or the username used to connect though. We see that the NTLM was the authentication protocol that was used therefore it’s probably a good idea to look at the NTLM dashboard hoping to find more info there. We quickly find the username in these NTLM logs:

Looking at the NTLM logs we can also find the name of the asset they tried to connect to:

Remote Administration

I first looked at the RDP dashboard but there was no connection logged. I then looked at the Connection dashboard for connection to port 5985 (WinRM over Http) and I found 6 connections:

These connections are all between the same two IP addresses:

A Virus You Say?

We know that the event number in Windows Defender corresponding to a malware being detected is 1116. Using the Beats dashboard and filtering by this Event ID # we find the host name as well as the threat name:


USB Device

The windows event 6416 is a USB plug and play event. It gets triggered every time a play and play device is inserted into the system. Still using the Beats dashboard and filtering by this event number we can see that this event was triggered 51 times.

We see 6 different machines had plug and play device inserted into them.

We are looking for the insertion of a mass storage device. Each log gives the specific description and ID of the device that was inserted. We find our culprit in the second log:

File Sharing is Caring

Still using the Beats Dashboard, we filter for Event ID 4624 (successful logins) to the file share SRV02. There are 159 successful connections to this file share:

Looking at the logs, we find 3 usernames connected to these connections:
Mario, Dkong and Kirby.
Let Me In!

What Is It?

Looking at the USB Drive question log ,we find the following line with the Vendor ID and Product ID for this USB device:

Searching for this VID and PID on the internet, we find that it is a flash drive made by silicon motion.
Love Letter 1

First, we generate the MD5 hash:

We then enter this file hash in VirusTotal and we look for the name given to it by Symantec:

Love Letter 2

We are looking for an executable file therefore we can assume that the string .exe will show up somewhere in our text file. We use the below command to find it:

We quickly find the URL using this technique.
Love Letter 3

Looking at the malware code, I can identify the non-HTTP protocol by examining the infectfiles() subroutine.
The code specifically checks for mIRC-related files:

When it finds these files, it creates a script.ini file that automatically executes IRC commands:

This script automatically sends the malicious file to anyone who joins an IRC channel where the infected user is present.
mIRC is an Internet Relay Chat client, and IRC has historically been a popular protocol for botnet command and control because it allows real-time communication with multiple infected machines through chat channels.
Love Letter 4


Secure Document 1

We use the strings command to scrape out the url without actually opening the file and we grep for http to quickly find it.

Secure Document 2

We can use a simple command to find the metadata for this file:

Secure Document 3

The data was already contained in our previous answer:

Secure Document 4

Using the website viewdns.info/iphistory, I entered the domain name globalsmedical.com and searched for the IP address that this domain was pointing to at the time the document claimed to be made (January 26th, 2017):

Objectives
This exercise introduces you to a machine learning/AI pipeline solution that pushes data from Zeek through an AI model to produce alerts about network activity.
Details
This lab will require us to start several SSH connections to the virtual machine. One of these will run Zeek, monitoring the loopback interface. Another will run a Python script that will load a trained AI model and use it to generate alerts regarding network protocols that are present. The last connection will be used to replay packets over loopback so that Zeek has something to look at.
To begin with, we need to get three separate command lines established to the VM.
tmux to split your current connection into at least three panes. Use the cd command to change into the /sec503/Exercises/Day5/ai directory in each session.classify.zeek script. This script will push the first bytes in every network stream into a Broker channel named /sec503/content. To use it, we need to start Zeek, ask it to listen on the lo or loopback interface, and configure it to run this script. When we run Zeek on loopback, we will also see warnings related to checksums. While we would never do so in production, we will tell Zeek to ignore checksums while running in this lab. Please execute the following command as root in one of the sessions:
3. Now that Zeek is running, we can use another one of our sessions to connect the AI classifier to the Broker channel. We will do this using the classify.py script in the lab directory. Please run it as follows:

4. Our final task is to send data over the loopback interface so that Zeek can see it, relay the sessions to the classifier, and the classifier can report what it is seeing. To do this, you will use your third session. This session must be running as root.

5. Observe the output in the session that is running the classification script:

While there are some protocols being misclassified, overall this tool is doing an excellent job identifying known protocols.
During class we discovered that there was unusual activity on January 2, 2021. This leads to several important questions. Which way was the data moving? What does the data appear to be? Is this likely data exfiltration? We will answer these questions in this lab.
In the course book, we saw that the anomalous activity occurs on Saturday, January 2, 2021.

If I scroll down to the January 2,2021 line, I can easily see that there was an abnormal number of bytes transferred on this day confirming our assumption:

2. Now that you have confirmed what the tool reported, it’s time to drill into January 2, 2021. Identify when the greatest amount of data is seen moving on the network. During which hour does the greatest amount of data move?
I just to modify my last command line by changing the start and end date in rwfilter and change the bin size in rwcount:

3. If you review the data that you find in step 2, you should see that the large amounts of data are being sent very early in the day. Please check the 24 period from 12:00 PM January 1 through 12:00 PM January 2 to check if the data flows begin on January 1.
I can specify a starting and ending hour on top my dates in rwfilter:

This confirms that the large amount of data being sent started right after midnight on January 2nd.
4. Satisfied that the large data transfer(s) are occurring on January 2, it’s time to examine what’s happening on January 2.
Examine the data between 00:00 and 12:00 on January 2. Which IP protocols are present and which of those transfers the greatest number of bytes?

Out of the 4 protocols transferring data on that day, TCP (protocol 6) is the one transferring the greatest number of bytes.
5. The next task is to find the connection or connections involved. Using SiLK, determine which connection or connections are most likely creating this massive spike. You might find it most interesting to look at the top 20 connections.

6. Examine the connections between 172.217.10.74 and 192.168.2.163. Also examine the records between 172.217.10.10 and 192.168.2.163. Answer the following questions:


The first thing that I noticed was that the communication between these hosts, at least for more than 95% and 92% of their communications, is over port 443.
• Is the data moving in or out of the enterprise?
• What seems to be the purpose of the communication?
whois on 172.217.10.74 and 172.217.10.10 show that the addresses are assigned to Google. It could mean that somebody is streaming Youtube videos from inside the network this is YouTube related, but it could also be some type of Google Drive synchronization.• Does this appear to be something automated or something human driven?
Details
In this section you will experiment with the rwcount, rwstats, and rwnuniq tools. The goal is to understand how these tools function and examine how they can be used to answer important questions that an analyst will ask when researching a network, investigating network activities, or engaging in threat hunting activities.
Exercise 1
The rwstats tool is used to aggregate information about a collection of flows according to a user specified aggregation criteria. Working with the data in this way can take some getting used to and there are some pitfalls to watch out for. All of the questions in this exercise make use of the SiLK repository on the VM in the date range from January 1, 2022 through July 8, 2022.

I begin with the rwfilter tool to query the repository for all flows in the date range. All of the flows that pass this filter are then processed by rwstats. Using the --fields argument, I can specify that the flows should be aggregated into bins for each unique source IP address. The IP address we are looking for is 10.200.223.4.
2. In the last question, it appears that an internal address appears in the greatest number of flows. While this is interesting, a more important question is likely, “Which source address sends the greatest number of bytes?” In fact, see if you can answer that question now: Which source IP address sends the greatest number of bytes and how many bytes does it send?
By default rwstats reports the number of flows. Using the --bytes option, I can override this, forcing it to aggregate and report based on the number of bytes

172.28.10.5 sends the greatest number of bytes.
3. The last answer reveals that an internal host appears to be the source of the greatest number of bytes sent. Where were those bytes sent?

4. Let’s narrow this data down even further. We now know which source sends the most data and to which destination that data is sent. Which protocols are used? Which port numbers?
Use the SiLK tools to identify the top 10 destination ports and protocols, based on the number of bytes, used in flows originating from 172.28.10.5 and going to 10.200.223.6.

5. Consider the output of the last solution. Is 172.28.10.5 likely to be the client or the server in these flows? Since the destination port on 10.200.223.6 is around the 44,000 range, it is most likely an ephemeral port. Since this is true, it is most likely that 172.28.10.5 is the server, not the client. This should make you wonder what the source port for these flows are and, possibly, what the top ten ports and protocols looks like when we view 172.28.10.5 as a server.
Since we are sure you are as curious about this as we are, please adjust your SiLK commands so that you are examining the top ten (based on bytes) source ports and protocols used by the source host 172.28.10.5 when 10.200.223.6 is the destination. What do you find?

6. Look carefully at the output from the commands in parts 4 and 5. What conclusion or conclusions can you draw about the port 22 activity? You may wish to run additional queries to identify the duration of one or more of these connections.
The host 172.28.10.5 is running a service on TCP port 22, probably an SSH service. This service is used by the host 10.200.223.6 to establish a number of connections and transfer a large amount of data over a number of connections.
It is important that we are able to reconstruct a complete session out of NetFlow data. We know that a single session may be comprised of a series of NetFlow records. In this exercise we will examine how to find and reassemble these pieces.
1. Please begin by using rwfilter and rwstats to find the top ten (by bytes) outbound TCP connections from the 172.16.0.0/16 network between May 1, 2019 and May 4, 2019. Which source host sends the greatest number of bytes to which external destination host? (An external host, in this case, should not have an address in the 10/8, 192.168/16, or 172.16/12 networks)

2. We can see that the largest number of bytes transferred is between the internal host 172.16.20.14 and the external host 52.223.227.117. Extract the TCP flow records between these two hosts using 172.16.20.14 as the source and 52.223.227.117 as the destination within the same time range, displaying the source IP, destination IP, source port, destination port, and flow duration.

3. Consider the output from the last step. Please notice the first seven rows. All of these have the same source and destination port, in addition to having the same duration. Looking at the duration, we can derive that the NetFlow sensor that is generating this flow information in the repository is most likely configured to use a refresh interval of 1,800 seconds. Seeing that all but the last flow are right at this threshold and that the source and destination ports do not change, it seems reasonable that these are all flows from the same session.
Please extract all of the flow records related to this specific connection. Your output should include the source, destination, session flags, initial flags, and the number of bytes transferred.

4. According to the NetFlow repository, how many bytes, in total, were transferred between these two hosts in this session?
The first record indicates that host 172.16.20.14 initiated the connection, sending the initial SYN. We can see that this connection is seen in seven time intervals. The last record has a FIN session flags set. This implies that we have all of the information about this entire session.
To determine the number of bytes transferred in total, I just need to do the sum of the bytes column:
4513670 + 4542623 + 4510972 + 4512661 + 4536906 + 4534209 + 4272324 = 31,405,419 bytes
Details
This section of exercises allows you to explore the use of SiLK with a NetFlow repository rather than using files generated from packet capture files. Using SiLK with packet captures is very useful during an incident response if a NetFlow repository isn’t available, but during normal day-to-day operations, you would typically use SiLK with a repository.
In our collective experience, even though NetFlow is generally already supported by the switches, routers, and other network devices that enterprises have installed, it is rare to find that an enterprise has a NetFlow repository configured unless they have a fairly knowledgeable network engineering staff. It is even more rare to find that it is being used for any type of security analytics or to identify potential indicators of compromise. A NetFlow repository, therefore, is one of the easiest and least expensive changes that can be made to a network infrastructure that will immediately provide greater insight into how the network is used and assist to identify anomalous behavior.
Exercise 1
SiLK Repository
When using SiLK with a repository, you have the ability to retrieve results covering long periods of time, from specific sensors, and more. SiLK relies on configuration files in the /data directory to determine what the names of the sensors are, how data will be collected, etc., in addition to which fields are displayed by default when using rwcut. There is absolutely no need for you to make any changes or directly work with the files in this directory, but you are welcome to explore.
1. Please query the repository stored on the class VM to determine the total number of flows seen between October 1, 2018, and October 15, 2018. How many flows are there?
I am going to use the rwfilter tool. Using this tool I can specify partitioning criteria, such as the type of data to retrieve, sensors of interest, and time ranges of interest. I can also specify query criteria, such as the protocols of interest etc…
On top of it, I can leverage options like --print-statistics that will give me the number of flows :

There are 4143675 flows in this time range.
2. Please query the repository for flows occurring between October 1, 2018, and October 15, 2018. How many TCP flows were logged?
I can use the same command as above to answer. All I have to modify is the –proto option as TCP is the protocol number 6:

There are 2997358 TCP flows logged.
Exercise 2
Since I am only interested in host seen establishing a connection, I must select all of the flows that begin with a SYN. I can use the rwfilter option called –flags-initial

There are 3 lines that are no part of the listed flows. Therefore, using wc -l, I can quickly figure out how many unique source hosts are seen:

We have 48 unique source hosts.
Exercise 3
Let’s switch to the repository data that does not have all of the flags data present. While it isn’t convenient to work with this data, it is not unusual to have sensors that will not properly populate these fields. This makes it important to have familiarity with working with this type of data.
The data of interest covers dates from February 8, 2022 through July 3, 2022.

Notice the behavior of the source hosts and source ports, in addition to the number of packets seen. How would you characterize this? Do these appear to be “real” connection attempts, or some type of spoofed scanning behavior?
I observed that the source IP address 172.28.30.4 appears to be initiating connections to six different destination hosts—first targeting port 9573, then port 10001. The presence of packets with only the SYN flag set strongly suggests these are the initial steps of a TCP three-way handshake. Additionally, I noticed that the source port changes with each destination, which indicates the packets are likely not spoofed but instead generated by a legitimate IP stack initiating connections.
Each flow consists of two or three packets, which is important. If there were only one packet per flow, it might suggest scanning behavior. However, two to three packets usually point to actual connection attempts, possibly with some retries involved. Taking all of this into account, the evidence supports that 172.28.30.4 is most likely making genuine TCP connection attempts rather than performing a spoofed scan.
2. Extract all of the records from this data that involve hosts 172.28.30.5 and host 192.225.158.2 and examine the sip, sport, dip, dport, flags, and packets fields.

Using the ‘–any-address’ option coupled with chaining two rwfilter command together allow me to extract the records involving just these two hosts.
3. Examine all of the flows between 192.225.158.2 and 172.28.30.5 and explain why two of the flows have no flags set.
If I add the protocol field to my rwcut command, I can see that the two flows with the SYN flag are TCP flows (protocol 6) whereas the flows without flags are UDP flows (protocol 17):

Exercise 1
aa:bb:cc:dd:ee:ffff:ff:ff:ff:ff:ff192.168.1.1192.168.1.2234I am going to use a tool called scapy to complete this lab:

The first thing that I need to do is to create an Ethernet header and an IP header, assigning each to a variable:

Let’s now create the ICMP sequence number:

Now that all the required headers have been built, I can assemble the frame:

The ICMP echo request is now crafted
2. Display the frame you just created.

3. Write the frame you created to the output pcap file named /tmp/icmp.pcap.

4. Use ssh to connect to the virtual machine in a second terminal window. In the new terminal, use tcpdump to examine the packet in /tmp/icmp.pcap to make sure that the frame you crafted matches the specifications detailed. With tcpdump, use either the -XX, -X, or -v option to show the link layer.

Exercise 2
/tmp/icmp.pcap that you just created in the previous exercise using a Scapy session./tmp/icmp2.pcap./tmp/icmp2.pcap in a different terminal (new or from the previous exercise) using tcpdump, supplying it the -vv option to verify that you crafted a valid record.We read /tmp/icmp.pcap into a list named r:

Next, I extract the only record in the list (r[0]) and assign it a name of echoreq

I assign the ICMP layer of the echoreq an attribute sequence number value of 4321 and display it.

Scapy displays the ICMP sequence number in hex, so I can validate that 0x10e1 is equivalent to decimal 4321:

Next, I use wrpcap() to write echoreq to /tmp/icmp2.pcap and use tcpdump in verbose mode to read the record.


2. When you view the resulting packet in the new /tmp/icmp2.pcap file with tcpdump, you should be able to identify an obvious problem with the packet. What is it?
The checksum is corrupted.
3. Why did this happen ?
I altered the ICMP sequence number value and did not get scapy to recompute the checksum after that. The checksum value is not recomputed until the frame is either or stored to a pcap file.
4. Correct the issue by altering the record that still exists in your Scapy interactive session and writing it out again to /tmp/icmp2.pcap.
I need to delete the checksum value from the ICMP header

Now, I can write it out again

5. Rerun tcpdump to make sure the error was corrected

Exercise 3
Description: This exercise requires you to craft and send some crafted traffic using Scapy. Specifically, you craft an ICMP echo request in one Scapy interactive session, listen for it in another Scapy interactive session, and respond with a crafted ICMP echo reply from the second session.
You need to open three different ssh connections to the virtual machine for this. If you still have Scapy running from the previous exercises, using sudo scapy, this can be the first ssh connection.
In a second terminal, use tcpdump to sniff for the traffic you will craft and send from the Scapy sessions from the other two terminals. Unlike simply reading a pcap as we have been doing, sniffing traffic using tcpdump requires you to have elevated privileges. Like with Scapy, use sudo to elevate your privileges when running tcpdump to sniff traffic off an interface. The below tcpdump command sniffs for traffic and disables DNS name resolution with the -n option, suppresses the timestamp display with the -tt option, shows you the ASCII payload with the -A option, and filters for ICMP traffic only. You do not need to specify the interface to sniff on if you are sniffing on the first Ethernet interface

In the third ssh session, invoke a second Scapy interactive interface and prepare Scapy to sniff an ICMP echo request that you will send from the first Scapy session.
The Scapy sniff listens on a given interface for packets and you can add BPF filters with the filter option. Run the below command in Scapy.


This is what I see in the tcpdump window

2. Return to the Scapy interface that sniffed the packet. Display the received ICMP echo request to find the ICMP ID value of 10, displayed as 0xa, and the ICMP sequence number of 100, displayed as 0x64.

3. Continuing in the Scapy session, craft and send an appropriate ICMP reply. Make use of the ICMP echo request that Scapy captured, modifying fields as necessary. You should build a new IP header, but reuse the ICMP header and payload from the captured packet.
First,I need to create a new IP header and stack that with the captured ICMP request and payload

Next, I need to set the source of this new IP packet to be whatever the destination address was in the request. I also need to set the destination address for this new IP packet to be the source of the captured request.

Finally, since I want to send an echo-reply, I need to set the ICMP type to be 0. I also need to delete the ICMP checksum value, which was copied from the original packet. I want Scapy to automatically recalculate this value so that a checksum error does not get generated.

Now, I can send my packet

4. Verify that your crafted echo reply was properly sent by checking the tcpdump output from the other window.

Description:
Examine the TCP session between hosts 192.168.1.103 and 192.168.1.104. There is something that is nonstandard about this session. What is it, and why might it cause an IDS evasion?

In Packet 64, the client at 192.168.1.104 tried to establish a connection with the server at 192.168.1.103. Instead of acknowledging the connection, host 192.168.1.103 sent a TCP packet with an SYN flag to the host at 192.168.1.104. In Packet 66, the client responded with a Syn Ack packet. This packet is flagged as a retransmission as there was a time lapse of 30 seconds between pack 65 and 66. What seems to have happened is that since he did not receive a response to his Syn packet, the client retransmitted it while at the same time acknowledging the Syn packet sent by host 192.168.1.103. The server sent an Ack package in packet 67 to complete the handshake and the session was established. This is what is called a “four-way handshake” and it might lead to IDS/IPS evasion as this session will not be tracked since it is not a conventional three-way handshake.
2. Can Snort find the malicious content?
One of the connections that is present in the evade.pcap file has content that looks like this:
21:56:47.400000 IP 184.168.221.65.52342 > 10.1.15.80: Flags [P.], seq 143:463, ack 1, win 8192, length 421
HTTP: GET /EVILSTUFF HTTP/1.1..Host: example.com..User-Agent: curl/7.35.0..Accept: /….
We can clearly see GET /EVILSTUFF HTTP/1.1 in the packet. Let’s see if we can alert on that using this alert:
alert http (msg: “Evil 1 in URI”; content: “EVIL”; sid:10000005; rev:1;)

Let’s run Snort and see how it does.

The alert was not triggered.
3. Can Zeek find it?
Let’s create a Zeek signature specifically designed to find that EVIL URL request. Please create a file named evil.sig that contains:
signature Evil {
ip-proto == tcp
dst-port == 80
payload /EVIL/
event “EVIL URL!”
}

Let’s run Zeek against the evade.pcap file and see if Zeek finds the known signature:

Zeek also fails to find the malicious content.
Exercise 2
Description: Look at the HTTP traffic between hosts 10.246.50.2 and 10.246.50.6.
Examine the HTTP headers on the GET request. What type of attack is this, and what does the code instruct the HTTP server to do? Was the attack successful? How do you know?

The User Agent looks abnormal. Normally it indicates the browser version used by the client, but in this case, it looks like an empty function followed by a ping command:
User-Agent: () { :;}; /bin/ping -c1 10.246.50.2
Searching for this kind of exploit online, I found out that this is a Shellshock vulnerability that is delivered via the User-Agent HTTP header value because the User-Agent is an environment variable.
If the attack was successful, I should be able to find a ping request sent from the server (10.246.50.6) to the client (10.246.50.2):

The attack was successful.
Description:
Look at the traffic between hosts 192.168.1.105 and 192.168.1.103. The fourth record in the exchange between the hosts is a RST from the client 192.168.1.105 to the server 192.168.1.103. However, as you can observe, 192.168.1.105 continues to send traffic and 192.168.1.103 acknowledges it. Explain the reason why traffic is sent and acknowledged after the RST and why it might cause an IDS evasion.

The fourth packet has a bad TCP checksum, meaning that receiving host 192.168.1.103 will drop it, permitting the subsequent sent and acknowledged packets. Some IDS/IPS systems do not validate the TCP checksum and therefore may stop tracking the session because it sees the RST. This would cause an evasion because the session continues and the destination host receives the malicious traffic without the IDS/IPS being aware of it.
In this lab, I will be developing another useful script that is a little bit more advanced than the ones created in part 1 and 2.
Exercise – HTTP Exfiltration?
Description:
In this exercise, we will create a script that locates anomalous outbound data transfers based on the idea that, generally, we would expect to find that web connections have more data coming from the server to the client. This will potentially allow us to identify data exfiltrations.
Using the Zeek documentation, write a Zeek script that prints a message any time a connection involving TCP port 80 ends and the amount of data sent by the client was greater than that sent by the server.
The first problem is determining which event to subscribe to. Again, I need to review the Zeek documentation, to be able to find corresponding events of interest for the problem I am trying to solve. I am looking for an event that would correspond to a connection ending. There is an event that corresponds to event new_connection(). This is the event connection_finished(c:connection) event.
I am interested in HTTP connections, and I want to limit my view to only connections involving TCP port 80. I first tried implementing the following condition in my script :
if(c$id$resp_p != 80) { return; }
but I received a ‘type clash‘ error when I tried running a script using this condition. Zeek exposes the idea of a port as its own data type. This data type requires a number and a protocol name, separated by a slash. Typical HTTP would be 80/tcp. My condition should then be modified to if(c$id$resp_p != 80/tcp) { return; }
I also need to look at the number of byts sent by the server and the client. The c$orig and c$resp sections of the connection_finished event handler have useful data in them. There are fields that will give me the total number of IP bytes, but there are also size attributes. This field informs you of the total number of bytes of payload sent by either side of the connection.
Using these fields, I would then need to add some logic that will compare the number of bytes sent by the originator to the number of bytes sent by the respondent. If the originator (client) sent more than the respondent (server), I want to print a message.

Let’s test this script out:
