Diabetes Analysis

Diabetes Case Study Analysis

This analysis used Python language program to analyse different aspects of Diabetes in the Pima Indians tribe by doing Exploratory Data Analysis.

CONTEXT:

Diabetes is one of the most frequent diseases worldwide and the number of diabetic patients are growing over the years. The main cause of diabetes remains unknown, yet scientists believe that both genetic factors and environmental lifestyle play a major role in diabetes.

A few years ago research was done on a tribe in America which is called the Pima tribe (also known as the Pima Indians). In this tribe, it was found that the ladies are prone to diabetes very early. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients were females at least 21 years old of Pima Indian heritage.

The dataset has the following information:

Pregnancies: Number of times pregnant
Glucose: Plasma glucose concentration over 2 hours in an oral glucose tolerance test
BloodPressure: Diastolic blood pressure (mm Hg)
SkinThickness: Triceps skin fold thickness (mm)
Insulin: 2-Hour serum insulin (mu U/ml)
BMI: Body mass index (weight in kg/(height in m)^2)
DiabetesPedigreeFunction: A function that scores the likelihood of diabetes based on family history.
Age: Age in years
Outcome: Class variable (0: a person is not diabetic or 1: a person is diabetic)

# import numpy as np

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt %matplotlib

inline print

dataset = pd.read_csv ("diabetes.csv") dataset.head()

dataset.tail(758)

dataset.iloc[: , 0 : 8].sum ()

dataset.describe ().T

sns.displot(dataset['BloodPressure'], kind = 'kde')

plt.show()

sns.pairplot(data = dataset, vars = ['Glucose', 'SkinThickness', 'DiabetesPedigreeFunction'], hue = 'Outcome') plt.show()

plt.scatter(x = 'Glucose', y = 'Insulin', data = dataset) plt.show()

plt.boxplot(dataset['Age']) plt.title('Boxplot of Age') plt.ylabel('Age') plt.show()

plt.boxplot(dataset[dataset['Outcome'] == 1]['Age']) plt.title('Distribution of Age for Women who has Diabetes') plt.xlabel('Age') plt.ylabel('Frequency')

plt.show()

corr_matrix = corr_matrix = dataset.corr() corr_matrix

plt.figure(figsize = (8, 8)) sns.heatmap(corr_matrix, annot = True) plt.show()

Observations: From the heatmap above, it shows that there are three variables which highly correlated to diabetes, as follows; age, pregnancies, Skin thickness, BMI, and glucose.The age and pregnancies shared the same value (0.54), meaning they contain similiar information. As well with BMI and akin thickness (0.53). While the most significant variable that correlated to diabetes is glucose level (0.49), and insulin level (0.40)

data mastercourse
&cONSULTING

Diabetes Case Study Analysis

Dr.

Wood

data mastercourse &cONSULTING

Diabetes Case Study Analysis

Dr.

Wood

data mastercourse
&cONSULTING