pip install bs4
Travel Iowa
Brief Description
Travel Iowa is the official tourism website for the state of Iowa, offering a wealth of information for those looking to explore. It includes details on new events and attractions, birding spots, stargazing locations, nostalgic soda fountains, and more.
Source
Source: Travel Iowa
A sample code to extract the data from Travel Iowa is illustrated below. A complete code is located in the html file …… This process below is specific to the nature of the html format we were able to retrieve from the travel Iowa website.
= """
html_doc
<div id="top">
<div id="ctl00_phMainContent_div3" class="listControls">
<div class="results">
</div>
</div>
<div ID="pnlControls" style="display:none;">
<div class="listControls" id="div4">
<div class="pager">
<ul>
<li>
<div id="pnlListingPager">
Page:
<span id="dpListing">
<span class="activePage">1</span>
<a class="previous" data-pager-page="0">previous</a>
<a class="next" data-pager-page="0">next</a>
</span>
of 1
</div>
</li>
</ul>
</div>
</div>
</div>
<div class="grid">
<div class="item">
<div class="item-text">
<div class="item-date">Jul <span>4 - 7</span></div>
<h3 class="item-title">
<a href="/calendar/flea-market-under-the-bridge/1647514">Flea Market Under the Bridge</a>
</h3>
<div class="item-city">Marquette</div>
<div class="item-venue">
<span>Venue:</span> <a href="/calendar/flea-market-under-the-bridge/1647514">Flea Market</a>
</div>
</div>
</div>
<div class="item">
<div class="item-text">
<div class="item-date">Jul <span>1 - </span>Oct <span>31</span></div>
<h3 class="item-title">
<a href="/calendar/historic-hills-scenic-byway-bale-trail/1643930">Historic Hills Scenic Byway Bale Trail</a>
</h3>
<div class="item-city">Fairfield</div>
<div class="item-venue">
<span>Venue:</span> <a href="/calendar/historic-hills-scenic-byway-bale-trail/1643930">Historic Hills Scenic Byway</a>
</div>
</div>
</div>
<div class="end-grid-action">
<button class="button" id="btnShowMoreEvents">Show More Events</button>
</div>
</div>
<div ID="pnlControls2" style="display:none;">
<div class="listControls" id="div1">
<div class="pager">
<ul>
<li>
<div id="pnlListingPager">
Page:
<span id="dpListing">
<span class="activePage">1</span>
<a class="previous" data-pager-page="0">previous</a>
<a class="next" data-pager-page="0">next</a>
</span>
of 1
</div>
</li>
</ul>
</div>
</div>
</div>
</div>
"""
import pandas as pd
from bs4 import BeautifulSoup
'display.max_colwidth', None)
pd.set_option(import warnings
'ignore')
warnings.filterwarnings(
# Base URL
= "https://www.traveliowa.com"
base_url
# Parse the HTML content
= BeautifulSoup(html_doc, 'html.parser')
soup
# Find all divs with class 'item'
= soup.find_all('div', class_='item')
items
# Initialize a list to store the extracted data
= []
data
# Extract data from each item
for item in items:
= item.find('div', class_='item-date').get_text(strip=True)
date = item.find('h3', class_='item-title').find('a')
title_tag = title_tag.get_text(strip=True)
title = base_url + title_tag['href']
link = item.find('div', class_='item-city').get_text(strip=True)
city
# Check if the venue div and its a tag exist
= item.find('div', class_='item-venue')
venue_div if venue_div and venue_div.find('a'):
= venue_div.find('a').get_text(strip=True)
venue else:
= 'Venue information not available'
venue
# Append the extracted data to the list
data.append({'Date': date,
'Title': title,
'Link': link,
'City': city,
'Venue': venue
})
# Create a DataFrame from the list of dictionaries
= pd.DataFrame(data)
df
df
Date | Title | Link | City | Venue | |
---|---|---|---|---|---|
0 | Jul4 - 7 | Flea Market Under the Bridge | https://www.traveliowa.com/calendar/flea-market-under-the-bridge/1647514 | Marquette | Flea Market |
1 | Jul1 -Oct31 | Historic Hills Scenic Byway Bale Trail | https://www.traveliowa.com/calendar/historic-hills-scenic-byway-bale-trail/1643930 | Fairfield | Historic Hills Scenic Byway |
Events_per_10k
Measure Description
In our analysis, we focused on the total number of past and future events (2021-2025) for each city. We calculated the total events per 10,000 people to evaluate their distribution across the cities.
Measure Calculation
After extracting the relevant data from the website, we tallied the total number of events for each city. To calculate our measure of interest (Events_per_10k), we used the following steps:
- Extracted city population data from the American Community Survey (ACS).
- Divided the total number of events in each city by the population of the corresponding city.
- Multiplied the results by 10,000 to express the number of events per 10,000 people.
Thus, the final measure Events_per_10k was obtained using this formula:
\[ \text{Events\_per\_10k} = \frac{\text{Total Number of Events}}{\text{Population}} \times 10,000 \]