Wednesday 28 November 2018

Data modification has started

I am modifying my data by feeding in the information where heart patients are supposed to consume classified among various constituents in a day. For example they can consume the following amount of food constituents:

Daily Totals: 1,181 calories, 62 g protein, 144 g carbohydrates, 27 g fiber, 44 g fat, 12 g sat. fat., 1,208 mg sodium.



I will be feeding in this information to categorize my data into three following categories: NORMAL, EATABLE and AVOIDABLE FOOD

NORMAL: The person is allowed to eat upto 1180 calories in a day
EATABLE: The person is allowed to eat upto 1200-1400 calories a day
AVOIDABLE: The food choices which fall beyond 1400 calories will be considered in this group


Data set gathering completed

The data was extracted from bbc good food https://www.bbcgoodfood.com/recipes
The steps involved are as follows:
- Install Parsehub
- Get the URLs for each of the recipes from different cuisines (refer urls.csv attached)
- Run the below Python code:
import requests
import urllib2
import unicodedata
from bs4 import BeautifulSoup
import csv
import os
import numpy as np
import pandas as pd

os.chdir('C:\\Users\\avg\\Desktop\\Latest_Data')
contents=[]
with open('urls.csv','r') as csvf:
    urls=csv.reader(csvf)
    for url in urls:
        contents.append(url)

for url in contents:
    req = urllib2.Request(url[0],headers={'User-Agent': 'Mozilla/5.0'})
    page = urllib2.urlopen(req).read()
    soup = BeautifulSoup(page, "html.parser")
    title = soup.find("h1", class_="recipe-header__title")
    file_name = title.text + ".csv"
    file = open(file_name, "wb")
    print(title.text)
    file.write(title.text.encode('ascii', 'ignore').decode('ascii'))
    nutrition = soup.find("ul", class_="nutrition")
    for li in nutrition.findAll("li"):
        label = li.find("span", class_="nutrition__label")
        value = li.find("span", class_="nutrition__value")
        print label.get_text() + ": " + value.get_text()
        file.write(","+value.get_text())
     
- The nutrient information is obtained as different csv files (one file per recipe)
- Merge the csv files to get the required data

Wednesday 24 October 2018

Progress till date:

- Obtained a data set that contains food items from various cuisines
- The data is structured and contains the nutrient information in detail
- Cleaned the data set and categorized the variables to be analysed
- Developed a sample survey form that would help to ascertain the taste preferences among people
- Carried out statistical analysis on the data set and noted down outliers, variance analysis and possible candidates for regression

Next Steps:
- Complete documentation of all findings till date
- Get the survey results and do a preliminary analysis
- Start building kNN model using available data and validate performance

Thursday 4 October 2018

Re Framing the Research Question

Research Question: 1) Impact of food habits on heart diseases - Using machine learning to recommend best fit food choices for heart patients


                                                                or



2)  Impact analysis of food choices on people suffering from heart ailments and leveraging machine learning models to recommend most suitable food options
 

Note:  

The main objective in altering the research question is to address the ailment (Heart disease) precisely and the impact of food choices for a patient with heart disease is detrimental. Thus the food choices that relates to the disease will play a major phenomena in our research work. Thereby, the study will conclude it's finding through a Recommendation system inculcating machine learning algorithm/s.