Wednesday, 28 November 2018

Data modification has started

I am modifying my data by feeding in the information where heart patients are supposed to consume classified among various constituents in a day. For example they can consume the following amount of food constituents:

Daily Totals: 1,181 calories, 62 g protein, 144 g carbohydrates, 27 g fiber, 44 g fat, 12 g sat. fat., 1,208 mg sodium.



I will be feeding in this information to categorize my data into three following categories: NORMAL, EATABLE and AVOIDABLE FOOD

NORMAL: The person is allowed to eat upto 1180 calories in a day
EATABLE: The person is allowed to eat upto 1200-1400 calories a day
AVOIDABLE: The food choices which fall beyond 1400 calories will be considered in this group


Data set gathering completed

The data was extracted from bbc good food https://www.bbcgoodfood.com/recipes
The steps involved are as follows:
- Install Parsehub
- Get the URLs for each of the recipes from different cuisines (refer urls.csv attached)
- Run the below Python code:
import requests
import urllib2
import unicodedata
from bs4 import BeautifulSoup
import csv
import os
import numpy as np
import pandas as pd

os.chdir('C:\\Users\\avg\\Desktop\\Latest_Data')
contents=[]
with open('urls.csv','r') as csvf:
    urls=csv.reader(csvf)
    for url in urls:
        contents.append(url)

for url in contents:
    req = urllib2.Request(url[0],headers={'User-Agent': 'Mozilla/5.0'})
    page = urllib2.urlopen(req).read()
    soup = BeautifulSoup(page, "html.parser")
    title = soup.find("h1", class_="recipe-header__title")
    file_name = title.text + ".csv"
    file = open(file_name, "wb")
    print(title.text)
    file.write(title.text.encode('ascii', 'ignore').decode('ascii'))
    nutrition = soup.find("ul", class_="nutrition")
    for li in nutrition.findAll("li"):
        label = li.find("span", class_="nutrition__label")
        value = li.find("span", class_="nutrition__value")
        print label.get_text() + ": " + value.get_text()
        file.write(","+value.get_text())
     
- The nutrient information is obtained as different csv files (one file per recipe)
- Merge the csv files to get the required data