This project explored web scraping and p5.js as a tools for making a responsive web page. My web page scrapes data from Wikipedia’s lists of film genres, to create a map that shows the popularity of different film genres over time. Popular genres can be telling indicators of societal trends, yet deciding which genre is ‘popular’ is not straight forward. Two types of information are presented in my final graph, which compares Horror films with Superhero films. These two particular genres are interesting because they are both popular, but it very different ways. Horror films have been made since film production began, and have been fairly cheap to produce, resulting in many movies being produced. Conversely, superhero movies are a more recent trend that is characterized by fewer, but far more expensive films.
The chart graphs the number of films of each genre produced over time, with the circles adding the context of how expensive those movies were on average vs. how much they cost to produce.
Much of the process of this project was dedicated to data collection, which proved to be very time consuming, and may have distracted from other things, like the interactions built into my webpage that could have improved the impact of the project. My concept was originally to have more than just two film genres, but the quantity of information became too large and difficult to display on one page. In graphing the data I realized that each genre’s location on the y-axis was unimportant; all that mattered was the size of the shape that was created. This allowed me to space the genres our more evenly on the page. Data for this project was collected using Python and Beautiful Soup, a tool that allows for easy web scraping. All data was from Wikipedia.com
Tools: p5.js, python3, html, css, beautifulSoup
This project was successful in that I managed to create and display a rather large data set of my own creation, however I would say that the logistics of creating this data set in a short period of time drew my attention away from the conceptual basis for the project, which could have been strengthened.
Your assignment is to scrape content from two or more different websites and combine the content in a single output.
1) Identify and use at least three different sets of content, from at least two different websites
2) Use BeautifulSoup to extract your desired text from those websites, using The Guardian example as a starting point
Combine the text from the different sources into a single presentation
3) After generating the text, your program should format the text inside of an HTML page and write out an html file that can be viewed in a browser.