Developing a Web application to generate personalized food recommendations for heart patients: Data set gathering completed

The data was extracted from bbc good food https://www.bbcgoodfood.com/recipes

The steps involved are as follows:

- Install Parsehub

- Get the URLs for each of the recipes from different cuisines (refer urls.csv attached)

- Run the below Python code:

import requests

import urllib2

import unicodedata

from bs4 import BeautifulSoup

import csv

import os

import numpy as np

import pandas as pd

os.chdir('C:\\Users\\avg\\Desktop\\Latest_Data')

contents=[]

with open('urls.csv','r') as csvf:

urls=csv.reader(csvf)

for url in urls:

contents.append(url)

for url in contents:

req = urllib2.Request(url[0],headers={'User-Agent': 'Mozilla/5.0'})

page = urllib2.urlopen(req).read()

soup = BeautifulSoup(page, "html.parser")

title = soup.find("h1", class_="recipe-header__title")

file_name = title.text + ".csv"

file = open(file_name, "wb")

print(title.text)

file.write(title.text.encode('ascii', 'ignore').decode('ascii'))

nutrition = soup.find("ul", class_="nutrition")

for li in nutrition.findAll("li"):

label = li.find("span", class_="nutrition__label")

value = li.find("span", class_="nutrition__value")

print label.get_text() + ": " + value.get_text()

file.write(","+value.get_text())

- The nutrient information is obtained as different csv files (one file per recipe)

- Merge the csv files to get the required data

Developing a Web application to generate personalized food recommendations for heart patients