TryHackMe: Advent of Cyber 2023 (Day 15) Jingle Bell SPAM: Machine Learning Saves the Day!
“Over the past few weeks, Best Festival Company employees have been receiving an excessive number of spam emails. These emails are trying to lure users into the trap of clicking on links and providing credentials. Spam emails are somehow ending up in the mailing box. It looks like the spam detector in place since before the merger has been disabled/damaged deliberately. Suspicion is on McGreedy, who is not so happy with the merger.”
Q1: What is the key first step in the Machine Learning pipeline?
The first step is Data Collection.
Q2: Which data preprocessing feature is used to create new features or modify existing ones to improve model performance?
This is Feature Engineering.
Q3: During the data splitting step, 20% of the dataset was split for testing. What is the percentage weightage avg of precision of spam detection?
Most of today’s exercise is just hitting play as you go through the notebook. All of the code is already input for you.
Q4: How many of the test emails are marked as spam?
I updated my code on lines 25 and 26. When I ran it I got two spam and the rest as ham. But that answer was incorrect.
I noticed the first message was obviously spam, but it never got picked up. I know it is not 100% accurate, so I ran it a few more times and after the 3rd time it finally registered correctly as spam.
Q5: One of the emails that is detected as spam contains a secret code. What is the code?
If you manually open up the test email dataset you can see the secret code on line 6.
You can’t copy and paste it, but I got you → I_HaTe_BesT_FestiVal
❤