Web Scraping with Python: How to Collect Real Data Using Requests and BeautifulSoup
Learn how to scrape real data from any website using Python, Requests, and BeautifulSoup. Collect quotes, weather, and stock data and save it for analysis.
Eventually you all wonder where you can get data from that isn’t just “Kaggle” or other used CSV files. We often look to API’s for this fresh data and this is where Web Scraping also comes into the picture.
Since I am doing a whole mini series here on Automation in Python, it’s only right to talk about once of the most useful and sought after types of automation.
Up to a certain stage, you rely on datasets someone else prepared for you: cleaned, structured, and formatted for convenience. But once you learn how to collect data straight from the web, you begin working with the real world. You gain freedom.
Every week you’ll be introduced to a new topic in Python, think of this as a mini starter course to get you going and allow you to have a structured roadmap that actually builds to create you a solid foundation in Python. Join us today!
Web scraping is the process of extracting data from websites. Not through manual copying and pasting, but through Python code. This lets you gather hundreds, thousands, or even millions of data points automatically.
It’s a powerful skill used in business intelligence, finance, research, journalism, marketing analytics, and many other fields. People pay a lot for skills like this.
In this article, we are going to learn the fundamentals of web scraping using two of the most popular Python tools: the requests library for downloading web pages and BeautifulSoup for parsing and navigating the HTML to extract data.
I do a quick breakdown of the basics so you all can extract quotes, weather information, and stock data on your own.
Then I’ll show you how to save your results into a CSV file so that you can analyze them later in Excel, Pandas, or other analytics tools.
We are not building a complicated scraper or massive automated pipeline today. Instead, we are building a foundation. Yes, there are stronger web crawlers out there that we can use for bigger projects like Scrapy.
This here is just about understanding the basics to set the stage for everything that comes later, such as scraping dynamic sites, dealing with pagination, rotating proxies, and automating recurring scrapes.
But all of that starts with one core idea: a request goes out, HTML comes in, and you learn how to read it.
Thank you guys for allowing me to do work that I find meaningful. This is my full-time job so I hope you will support my work by joining as a premium reader today.
If you’re already a premium reader, thank you from the bottom of my heart! You can leave feedback and recommend topics and projects at the bottom of all my articles.
You can get started with Python today with the goal to land a job in the next few months - Join the Masterclass Here.
👉 I genuinely hope you get value from these articles, if you do, please help me out, leave it a ❤️, and share it with others who would enjoy this. Thank you so much!
The Webpage as Data
Before writing any code, it helps to understand what we are scraping. Every public webpage is built on HTML. When you open a website in your browser, your browser retrieves the HTML and then formats and styles it visually for you.
But underneath all the design, animations, colors, and layouts, everything on the page is structured inside HTML tags: <div>, <p>, <span>, <ul>, and so on.
When you scrape data, your goal is to look past the visual surface and retrieve the structured information inside the HTML.
It helps to think of a website like a library bookshelf. Your browser is the librarian who turns those book barcode labels into something you can understand. But when you scrape, you’re going directly to the shelves and taking the book yourself.
You need to know where it is located, how it is organized, and how to read it from the inside rather than judging by the cover.
This is really where BeautifulSoup helps us out a lot. It lets your Python code step into that HTML structure and search, navigate, and extract information from the tags.
👉 I genuinely hope you get value from these articles, if you do, please help me out, leave it a ❤️, and share it with others who would enjoy this. Thank you so much!
Getting Started: Installing the Libraries
We will need just two external libraries:
pip3 install requests
pip3 install beautifulsoup4Since we are getting you started here today, we don’t actualyl need anything too crazy. These two libraries will do more than enough for us, as I mentioned earlier there are more powerful ones that we can use for more in-depth crawling but HTML wise this is all we need.
requestshandles downloading web pages.BeautifulSouphelps us parse and navigate through the HTML.
Once installed, it’s time to begin all the fun stuff.
Learn Python. Build Projects. Get Confident!
Most people get stuck before they even start… But that doesn’t have to be you!
The Python Masterclass is designed to take you from “I don’t know where to start” to “I can build real-world Python projects” — in less than 90 days.
👉 I’m giving you my exact system that’s been proven and tested by over 1,500 students over the last 4+ years!
My masterclass is designed so you see your first win in less than 7 days — you’ll build your first working Python scripts in week one and finish projects in your first month.
The sooner you start, the sooner you’ll have projects you can actually show to employers or clients.
Imagine where you’ll be 90 days from now if you start today.
👉 Ready to get started?
P.S. — Get 20% off your First Month with the code: save20now. Use it at checkout!
Be your Own Poet
There is a website designed specifically for web scraping practice: http://quotes.toscrape.com.
If you’ve ever read into web scraping or seen a tutorial then they may have even used this. We will use it because it is stable, accessible, and structured simply.
Here is a basic example that retrieves quotes and authors:
When you run this, you’ll see output like:
“The world as we have created it is a process of our thinking.” — Albert Einstein
“It is our choices, Harry, that show what we truly are.” — J.K. RowlingLet me break down a little more what is happening here for you guys.
We send a request to the website using requests.get(). The website responds with the HTML source code. We then pass the HTML response into BeautifulSoup, which creates an object that lets us search the HTML almost like a tree.
Then we look for pieces of HTML that match the pattern for quotes. Each quote is inside a <div class=”quote”> section. We extract the text and the author from inside that structure.
This is the core skill in web scraping: identifying patterns in the HTML structure of a page, then locating and extracting the data inside those patterns.
Circling back to the library now, It’s similar to scanning a shelf of books for a specific author while ignoring all the irrelevant ones.
👉 I genuinely hope you get value from these articles, if you do, please help me out, leave it a ❤️, and share it with others who would enjoy this. Thank you so much!
Be your Own Weather Man
Weather websites are a popular target for scraping. For instance, imagine you want to gather today’s temperature for your city from a weather report page.
We will use the website: https://forecast.weather.gov, which is run by the US National Weather Service and is public data.
For weather results, I often recommend OpenWeatherAPI, but that is just for API’s like if we were building apps. For scraping we don’t actually need the API so the weather.gov site is just fine.
This touches another important principle: the layout might vary across websites, but the core method never changes. You examine the page structure, find the tags containing the data, and extract it.
👉 I genuinely hope you get value from these articles, if you do, please help me out, leave it a ❤️, and share it with others who would enjoy this. Thank you so much!
Be your Own Broker
Now let’s look at financial data. Yahoo Finance provides stock price pages in a format that is relatively fixed. Suppose we want the current price of Apple stock:
When you scrape financial websites, you must be careful to check the terms of use. Many allow viewing for personal study but place limits on high-volume or automated scraping.
This is why in professional environments, analysts sometimes use paid APIs instead of scraping. Still, for learning and personal exploration, knowing how this works is extremely valuable.
👉 I genuinely hope you get value from these articles, if you do, please help me out, leave it a ❤️, and share it with others who would enjoy this. Thank you so much!
Cleaning and Preparing Data
Raw scraped data usually needs cleaning. The text may contain whitespace, special characters, or formatting symbols. For example, if the weather output includes degrees symbols or the stock price includes commas, you may want to remove or convert them before analysis.
This is where Python’s string manipulation skills come in handy, such as .strip(), .replace(), or converting to numeric values with float().
However, do not overthink this step initially. Scrape first. Clean later.
Saving to CSV
Now that your data is cleaned it becomes significantly more useful once you start storing it. CSV files are one of the simplest storage formats because they are readable by Excel, Google Sheets, pandas, and SQL import tools.
Here is an example that takes scraped quotes and saves them to a CSV file:
Run this code and open quotes.csv in Excel. You will see two clean columns: one with quotes and one with authors. This is data you can now analyze, search, or visualize.
The CSV storage step is a pivotal milestone because it transforms scraped data into something reusable. Data analysts often scrape periodically and append new data each day to track changes over time. Stock prices, for example, become more useful when studied across weeks rather than looked at only once.
👉 I genuinely hope you get value from these articles, if you do, please help me out, leave it a ❤️, and share it with others who would enjoy this. Thank you so much!
Wrapping it up
Learning to scrape the web opens the door to a new phase of your capability with data. You are no longer limited to the datasets others choose to share. You can extract your own. You can gather the exact information that matters to your goals.
I just touched on the pure basics here today, these are common practice scenarios for people just starting out with web scraping. But if you want to get a bit more advanced, here is a video on how to scrape reddit.
The combination of requests and BeautifulSoup is enough to give you foundational scraping power that applies to thousands of websites and millions of possible data sources.
At its core, web scraping is nothing more than retrieving HTML and learning how to navigate it. But the real shift is in mindset. Once you see websites as structured data sources rather than just visual pages, you begin to understand how the web really works at its foundation.
Today, you scraped quotes, weather data, and stock information. You parsed HTML tags, extracted meaningful text, and stored it in a CSV file. From here, you can explore pagination, automate repeated scraping tasks, scrape multiple URLs, or combine scraping with pandas to generate insight directly.
The next step is practice. Choose any site that displays regularly updated information. Inspect the HTML. Find the tags containing the data. Write code that extracts it. Save it. Process it. Analyze it.
This is where the learning solidifies. Not by reading, but by doing.
Hope you all have an amazing week nerds ~ Josh (Chief Nerd Officer 🤓)
👉 If you’ve been enjoying these lessons, consider subscribing to the premium version. You’ll get full access to all my past and future articles, all the code examples, extra Python projects, and more.








