468 lines
125 KiB
Plaintext
468 lines
125 KiB
Plaintext
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "b3fea80c-f5ab-4a63-b29e-e5043fd7c96e",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Dataset exploration"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "ea1a24a8-1942-494a-a795-96d3d1f07dae",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Setup and Data Loading"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "61b65190-569d-418e-a75c-c91db5106f25",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Import necessary libraries:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 4,
|
||
|
"id": "04e223dc-b69e-4ebf-ad58-a79e630cb75f",
|
||
|
"metadata": {
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import pandas as pd\n",
|
||
|
"import numpy as np\n",
|
||
|
"import matplotlib.pyplot as plt"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "ebf95787-c0dc-4668-b8ea-0f8479a7b978",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
" Load your data into a Pandas DataFrame:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 29,
|
||
|
"id": "740240e1-ad27-47e2-b395-fb08dcadbf1d",
|
||
|
"metadata": {
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"train_data_orig = pd.read_csv('train_data.csv')\n",
|
||
|
"test_data_orig = pd.read_csv('test_data.csv')"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 30,
|
||
|
"id": "8ce84259-79ea-429d-9e42-91ffa23edcb6",
|
||
|
"metadata": {
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"#train_data_orig"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "237e8e44-62e8-4d0f-9da4-bb6fe7b8c8c6",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Data Merging/Concatenation (if necessary)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 31,
|
||
|
"id": "5a4300f9-60a4-414a-9a4c-684bbc43029e",
|
||
|
"metadata": {
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# If concatenating vertically (append data in rows)\n",
|
||
|
"conc_data = pd.concat([train_data_orig, test_data_orig], ignore_index=True)\n",
|
||
|
"\n",
|
||
|
"# If merging based on a common column (e.g., 'output')\n",
|
||
|
"# conc_data = pd.merge(train_data_orig, test_data_orig, on='output', how='inner')\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "331c8f02-01a7-49c2-b976-017e7627dbf4",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Normalization\n",
|
||
|
"Data normalization is a technique used to change the values of numeric columns in the dataset to a common scale."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 36,
|
||
|
"id": "3ebbb0aa-5149-4e25-adb4-67ab883dbf19",
|
||
|
"metadata": {
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# Mannual Normalisation:\n",
|
||
|
"#numerical_cols = ['output', 'input1', 'input2', 'input3', ..., 'input21']\n",
|
||
|
"\n",
|
||
|
"# Programatically\n",
|
||
|
"numerical_cols = train_data_orig.select_dtypes(include=np.number).columns.tolist()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "afbe7ca8-848f-4916-8466-de9d06717a68",
|
||
|
"metadata": {
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
" Handle any missing values in these columns:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 42,
|
||
|
"id": "1c4a0bf6-3e68-474c-b24c-444478603922",
|
||
|
"metadata": {
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# Fill NAs with the mean value of each column\n",
|
||
|
"train_data_orig[numerical_cols] = test_data_orig[numerical_cols].apply(lambda x: x.fillna(x.mean()), axis=0)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "a1da0a70-a48c-4b1b-b150-b8477815eed9",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**Apply normalization to the selected columns:**\n",
|
||
|
"\n",
|
||
|
"Choose a normalization method. The two most common methods are:\n",
|
||
|
"\n",
|
||
|
" ***1. Min-Max Scaling***\n",
|
||
|
"\n",
|
||
|
"This method rescales the features to a fixed range, usually [0,1].\n",
|
||
|
"\\begin{equation}\n",
|
||
|
"X_{\\text{norm}} = \\frac{X - X_{\\text{min}}}{X_{\\text{max}} - X_{\\text{min}}}\n",
|
||
|
"\\end{equation}\n",
|
||
|
"\n",
|
||
|
"Where:\n",
|
||
|
"- \\( X \\) is the original value.\n",
|
||
|
"- \\( X_{\\text{min}} \\) is the minimum value in the column.\n",
|
||
|
"- \\( X_{\\text{max}} \\) is the maximum value in the column.\n",
|
||
|
"\n",
|
||
|
" ***2. Z-score Normalization (Standardization)***\n",
|
||
|
"\n",
|
||
|
"This method uses the mean and standard deviation of the feature.\n",
|
||
|
"\n",
|
||
|
"\\begin{equation}\n",
|
||
|
"X_{\\text{std}} = \\frac{X - \\mu}{\\sigma}\n",
|
||
|
"\\end{equation}\n",
|
||
|
"\n",
|
||
|
"Where:\n",
|
||
|
"- \\( X \\) is the original value.\n",
|
||
|
"- \\( \\mu \\) is the mean of the column.\n",
|
||
|
"- \\( \\sigma \\) is the standard deviation of the column."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 45,
|
||
|
"id": "f5c85bb1-c5ba-4dcd-bb63-2b7331e1a575",
|
||
|
"metadata": {
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# Using Min-Max normalization\n",
|
||
|
"test_data_orig[numerical_cols] = test_data_orig[numerical_cols].apply(\n",
|
||
|
" lambda x: (x - x.min()) / (x.max() - x.min()), axis=0\n",
|
||
|
")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "1593accd-45aa-4467-afe5-17349a9217f7",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Saving normalized data to a CSV file"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 46,
|
||
|
"id": "5c916c35-a519-4d82-bb0c-2becd7b6b72c",
|
||
|
"metadata": {
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"#train_data.to_csv('path_to_save/normalized_data.csv', index=False)\n",
|
||
|
"train_data_orig.to_csv('normalized_train_data.csv', index=False)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "f1e5bde8-fc56-479b-83cf-5251b070a542",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Preliminary Data Exploration"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 47,
|
||
|
"id": "f851a8b4-537b-4485-ba65-9df7f0dd266a",
|
||
|
"metadata": {
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"<class 'pandas.core.frame.DataFrame'>\n",
|
||
|
"RangeIndex: 10979 entries, 0 to 10978\n",
|
||
|
"Data columns (total 22 columns):\n",
|
||
|
" # Column Non-Null Count Dtype \n",
|
||
|
"--- ------ -------------- ----- \n",
|
||
|
" 0 output 10979 non-null float64\n",
|
||
|
" 1 input1 10979 non-null float64\n",
|
||
|
" 2 input2 10979 non-null float64\n",
|
||
|
" 3 input3 10979 non-null float64\n",
|
||
|
" 4 input4 10979 non-null float64\n",
|
||
|
" 5 input5 10979 non-null float64\n",
|
||
|
" 6 input6 10979 non-null float64\n",
|
||
|
" 7 input7 10979 non-null float64\n",
|
||
|
" 8 input8 10979 non-null float64\n",
|
||
|
" 9 input9 10979 non-null float64\n",
|
||
|
" 10 input10 10979 non-null float64\n",
|
||
|
" 11 input11 10979 non-null float64\n",
|
||
|
" 12 input12 10979 non-null float64\n",
|
||
|
" 13 input13 10979 non-null float64\n",
|
||
|
" 14 input14 10979 non-null float64\n",
|
||
|
" 15 input15 10979 non-null float64\n",
|
||
|
" 16 input16 10979 non-null float64\n",
|
||
|
" 17 input17 10979 non-null float64\n",
|
||
|
" 18 input18 10979 non-null float64\n",
|
||
|
" 19 input19 10979 non-null float64\n",
|
||
|
" 20 input20 10979 non-null float64\n",
|
||
|
" 21 input21 10979 non-null float64\n",
|
||
|
"dtypes: float64(22)\n",
|
||
|
"memory usage: 1.8 MB\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"output 0\n",
|
||
|
"input1 0\n",
|
||
|
"input2 0\n",
|
||
|
"input3 0\n",
|
||
|
"input4 0\n",
|
||
|
"input5 0\n",
|
||
|
"input6 0\n",
|
||
|
"input7 0\n",
|
||
|
"input8 0\n",
|
||
|
"input9 0\n",
|
||
|
"input10 0\n",
|
||
|
"input11 0\n",
|
||
|
"input12 0\n",
|
||
|
"input13 0\n",
|
||
|
"input14 0\n",
|
||
|
"input15 0\n",
|
||
|
"input16 0\n",
|
||
|
"input17 0\n",
|
||
|
"input18 0\n",
|
||
|
"input19 0\n",
|
||
|
"input20 0\n",
|
||
|
"input21 0\n",
|
||
|
"dtype: int64"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 47,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# Visualize first rows of the data\n",
|
||
|
"train_data.head()\n",
|
||
|
"\n",
|
||
|
"# Obtain summary info about data\n",
|
||
|
"train_data.info()\n",
|
||
|
"\n",
|
||
|
"# Statistical summary\n",
|
||
|
"train_data.describe()\n",
|
||
|
"\n",
|
||
|
"#Check for null or missing values and handle them.\n",
|
||
|
"data.isnull().sum()\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "074a38d4-ac5b-4126-9817-4391f4401609",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Detailed Data Exploration and Visualization"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 48,
|
||
|
"id": "73ebc06f-9b59-4531-ba26-b0f07f0b7c92",
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"Requirement already satisfied: matplotlib in /opt/conda/lib/python3.10/site-packages (3.8.0)\n",
|
||
|
"Requirement already satisfied: seaborn in /opt/conda/lib/python3.10/site-packages (0.13.0)\n",
|
||
|
"Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/lib/python3.10/site-packages (from matplotlib) (1.1.1)\n",
|
||
|
"Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.10/site-packages (from matplotlib) (0.12.0)\n",
|
||
|
"Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.10/site-packages (from matplotlib) (4.43.0)\n",
|
||
|
"Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.10/site-packages (from matplotlib) (1.4.5)\n",
|
||
|
"Requirement already satisfied: numpy<2,>=1.21 in /opt/conda/lib/python3.10/site-packages (from matplotlib) (1.26.0)\n",
|
||
|
"Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.10/site-packages (from matplotlib) (23.2)\n",
|
||
|
"Requirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.10/site-packages (from matplotlib) (10.0.1)\n",
|
||
|
"Requirement already satisfied: pyparsing>=2.3.1 in /opt/conda/lib/python3.10/site-packages (from matplotlib) (3.1.1)\n",
|
||
|
"Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.10/site-packages (from matplotlib) (2.8.2)\n",
|
||
|
"Requirement already satisfied: pandas>=1.2 in /opt/conda/lib/python3.10/site-packages (from seaborn) (2.1.1)\n",
|
||
|
"Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.10/site-packages (from pandas>=1.2->seaborn) (2023.3)\n",
|
||
|
"Requirement already satisfied: tzdata>=2022.1 in /opt/conda/lib/python3.10/site-packages (from pandas>=1.2->seaborn) (2023.3)\n",
|
||
|
"Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"!pip install matplotlib seaborn"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 49,
|
||
|
"id": "f6cc0667-31ad-4523-bafc-8a588fdcf8ba",
|
||
|
"metadata": {
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import pandas as pd\n",
|
||
|
"import matplotlib.pyplot as plt\n",
|
||
|
"import seaborn as sns"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 50,
|
||
|
"id": "bfecfaf0-803b-4293-9fd3-3529c00920d2",
|
||
|
"metadata": {
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"train_data = pd.read_csv('normalized_train_data.csv')"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 61,
|
||
|
"id": "d1cb4f55-e647-4c11-bd99-0c3fa2a4e8e4",
|
||
|
"metadata": {
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA04AAAIjCAYAAAA0vUuxAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3hUZdoG8Hv6ZNJJAAEJRAaFUEMTSGJvNBU7sGtCUFYRsHwqgiAoirq6q7uwrKuCsIro2tAV7OgKKFJdAUEJImGlJqRPJpOZOd8fk3M4Z+aUSUih3L/r2utappx5T5k473ne53lMgiAIICIiIiIiIk3mlh4AERERERHRyY4TJyIiIiIiIgOcOBERERERERngxImIiIiIiMgAJ05EREREREQGOHEiIiIiIiIywIkTERERERGRAU6ciIiIiIiIDHDiREREREREZIATJyIiUrjoootw0UUXtfQwFA4fPowbbrgBKSkpMJlMeP755zVfazKZMGfOnGYbGxERnRk4cSKiM8K2bdtwww03oFOnTnA6nejQoQMuv/xyzJ8/v8k+8/XXX1f9gX/gwAHMmTMH33//fZN9dkvweDyYM2cOvvrqq0bf9r333otPPvkE06dPx6uvvoqrrrqq0T+jqcybNw8rVqyIeHzjxo2YPHkyevTogdjYWKSlpeGmm27Czz//HPHaDRs2YNKkSejfvz9sNhtMJlMzjLzlfPPNN5gzZw5KS0ub5fO0zhERkZxJEAShpQdBRNSUvvnmG1x88cVIS0tDbm4uzjrrLOzfvx/r16/Hnj17UFBQ0CSfO3LkSGzfvh2//vqr4vFNmzZh4MCBeOWVV5CXl9ckn30ixGhTfSdARUVFaN26NWbPnt3oEZ+zzjoLl112GV577TXD13q9XlitVlit1kYdQ0PFxcXhhhtuwJIlSxSP33DDDVi3bh1uvPFG9O7dG4cOHcKCBQtQWVmJ9evXo2fPntJr58yZg3nz5qF3796oqKjAzz//jNP5P9/PPvssHnjgAezduxedO3du8s/TOkdERHInx39ViIia0BNPPIHExERs3LgRSUlJiueOHDnSMoNqAlVVVYiNjW3pYTSJI0eORJw7LU6ns2kH00juu+8+vP7667Db7dJjN998M3r16oWnnnpKMUm88847MW3aNMTExGDy5MmqUSkiImpaXKpHRKe9PXv2oEePHqo/vNu0aRPx2GuvvYZBgwbB5XIhOTkZF1xwAT799FPp+ffffx8jRoxA+/bt4XA40KVLF8ydOxeBQEB6zUUXXYSVK1di3759MJlMMJlM6Ny5M7766isMHDgQADB+/HjpOfmd7u+++w5XXXUVEhMT4XK5cOGFF2LdunWKMc6ZMwcmkwk//vgjxo4di+TkZGRnZ2segyVLlsBkMuHrr7/GH/7wB6SkpCAhIQG33norSkpKDI/hkSNHMGHCBLRt2xZOpxN9+vTB0qVLped//fVXtG7dGgDw6KOPSvtlFHn65ZdfcOONN6JVq1ZwuVwYPHgwVq5cGTFuQRDwt7/9TdqunvDPFY9VQUEB8vLykJSUhMTERIwfPx4ejyfivZMnT8ayZctw3nnnwel0on///vj6668Vr8vLy1ONhIifJd9eVVUVli5dKo1djDIOHTpUMWkCgK5du6JHjx7YuXOn4vG2bdsiJiZGd7+19OzZExdffHHE48FgEB06dMANN9wgPfbGG2+gf//+iI+PR0JCAnr16oW//OUvhp9RVVWF//u//0PHjh3hcDhw3nnn4dlnn1VExX799deIa10kP2dz5szBAw88AABIT0+XjpsYuW3Oc0REJMeIExGd9jp16oRvv/0W27dvVyx/UvPoo49izpw5GDp0KB577DHY7XZ89913WL16Na644goAoR/zcXFxuO+++xAXF4fVq1fjkUceQXl5OZ555hkAwMMPP4yysjL873//w3PPPQcgtByoe/fueOyxx/DII49g4sSJyMnJARD6EQ0Aq1evxrBhw9C/f3/Mnj0bZrMZr7zyCi655BKsWbMGgwYNUoz3xhtvRNeuXTFv3ryolm5NnjwZSUlJmDNnDn766Sf8/e9/x759+/DVV19pTkiqq6tx0UUXoaCgAJMnT0Z6ejreeust5OXlobS0FHfffTdat26Nv//977jzzjsxevRoXHfddQCA3r17a47l8OHDGDp0KDweD6ZOnYqUlBQsXboUV199Nd5++22MHj0aF1xwAV599VX8/ve/x+WXX45bb73VcB+13HTTTUhPT8eTTz6JLVu24OWXX0abNm3w9NNPK173n//8B2+++SamTp0Kh8OBhQsX4qqrrsKGDRsMr59wr776Km677TYMGjQIEydOBAB06dJF8/WCIODw4cPo0aNH/XdQw80334w5c+bg0KFDOOuss6TH165diwMHDuCWW24BAHz22WcYM2YMLr30UumY7Ny5E+vWrcPdd9+tO+arr74aX375JSZMmIC+ffvik08+wQMPPIDffvtNuv6jdd111+Hnn3/G8uXL8dxzzyE1NRUApIk50LLniIjOYAIR0Wnu008/FSwWi2CxWIQhQ4YIDz74oPDJJ58IPp9P8brdu3cLZrNZGD16tBAIBBTPBYNB6f97PJ6Iz/jDH/4guFwuwev1So+NGDFC6NSpU8RrN27cKAAQXnnllYjP6Nq1q3DllVdGfF56erpw+eWXS4/Nnj1bACCMGTMmqmPwyiuvCACE/v37K/b7j3/8owBAeP/996XHLrzwQuHCCy+U/v38888LAITXXntNeszn8wlDhgwR4uLihPLyckEQBOHo0aMCAGH27NlRjemee+4RAAhr1qyRHquoqBDS09OFzp07K84BAOGuu+6KarvhYxCPVX5+vuJ1o0ePFlJSUiLeC0DYtGmT9Ni+ffsEp9MpjB49WnosNzdX9dyKnyUXGxsr5ObmRjX2V199VQAgLFq0SPM1d911V8Rn6Pnpp58EAML8+fMVj0+aNEmIi4uTrue7775bSEhIEPx+f9TbFgRBWLFihQBAePzxxxWP33DDDYLJZBIKCgoEQRCEvXv3ql73ghB5zp555hkBgLB3717V17bkOSKiMxeX6hHRae/yyy/Ht99+i6uvvhr//e9/8cc//hFXXnklOnTogA8++EB63YoVKxAMBvHII4/AbFb+eZRHY+RLpioqKlBUVIScnBx4PB7s2rWrweP8/vvvsXv3bowdOxbFxcUoKipCUVERqqqqcOmll+Lrr79GMBhUvOeOO+6o12dMnDgRNptN+vedd94Jq9WKVatWab5n1apVOOusszBmzBjpMZvNhqlTp6KyshL/+c9/6jUG+XYHDRqkWGIYFxeHiRMn4tdff8WPP/7YoO1qCT9WOTk5KC4uRnl5ueLxIUOGoH///tK/09LScM011+CTTz5RLMdsbLt27cJdd92FIUOGIDc3t9G2e+6556Jv37548803pccCgQDefvttjBo1Srqek5KSUFVVhc8++6xe21+1ahUsFgumTp2qePz//u//IAgCPvrooxPfiTAtdY6I6MzGiRMRnREGDhyId999FyUlJdiwYQOmT5+OiooK3HDDDdIP9D179sBsNiMjI0N3Wzt27MDo0aORmJiIhIQEtG7dGr/73e8AAGVlZQ0e4+7duwEAubm5aN26teJ/L7/8MmpqaiK2n56eXq/P6Nq1q+LfcXFxaNeuXUTlP7l9+/aha9euEZPJ7t27S883xL59+3DeeedFPH6i29WSlpam+HdycjIAROR4hR8jIDT58Hg8OHr0aKOOSXTo0CGMGDECiYmJePvtt2GxWBp1+zfffDPWrVuH3377DUCoYuKRI0dw8803S6+ZNGkSzj33XAwbNgxnn3028vPz8fHHHxtue9++fWjfvj3i4+MVjzfVeQRa5hwRETHHiYjOKHa7HQMHDsTAgQNx7rnnYvz48Xjrrbcwe/bsqN5fWlqKCy+8EAkJCXjsscfQpUsXOJ1ObNmyBdOmTYuICNWH+N5nnnkGffv2VX1NXFyc4t8NLRhwJtKajAgNKOutlQ/WkGhHWVkZhg0bhtLSUqxZswbt27ev9zaM3Hz
|
||
|
"text/plain": [
|
||
|
"<Figure size 1000x600 with 1 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"plt.figure(figsize=(10, 6))\n",
|
||
|
"sns.scatterplot(x='input21', y='output', data=train_data)\n",
|
||
|
"plt.title('Scatter plot of input21 vs output')\n",
|
||
|
"plt.xlabel('input21')\n",
|
||
|
"plt.ylabel('output')\n",
|
||
|
"plt.show()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 71,
|
||
|
"id": "824f2d43-94ea-46bf-af52-87f261d3119a",
|
||
|
"metadata": {
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA04AAAIOCAYAAACPj11ZAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAwy0lEQVR4nO3de3RV5Z0/4O8hkEQEIhRIhKKIOCBVQKFgHBVtM8ZqFTtORacVipZxtExro/6UXkC0FbwhbWWkVdBWrdDW61IHa7NEhxEvBWzRKtYLBZRwUSEYbMBk//7oMjUF3RxIcgI8z1p7rZz3vPucz1lr98ine+/3ZJIkSQIAAICP1SbXAQAAAFo7xQkAACCF4gQAAJBCcQIAAEihOAEAAKRQnAAAAFIoTgAAACkUJwAAgBSKEwAAQIq2uQ7Q0urr6+Ott96Kjh07RiaTyXUcAAAgR5IkiU2bNkWPHj2iTZuUc0pJK3DTTTclBx54YFJQUJAMGzYseeaZZz527m233ZZERKOtoKBgh99r5cqV2+xvs9lsNpvNZrPZ9t5t5cqVqT0i52ec5s6dGxUVFTFz5swYPnx4TJ8+PcrLy2PZsmXRvXv37e7TqVOnWLZsWcPjbM4cdezYMSIiVq5cGZ06ddq18AAAwG6ruro6evXq1dARPknOi9O0adNi3LhxMXbs2IiImDlzZjz88MMxe/bsuPzyy7e7TyaTiZKSkp16vw9LVqdOnRQnAABgh07E5HRxiC1btsSiRYuirKysYaxNmzZRVlYWCxcu/Nj93nvvvTjwwAOjV69eMXLkyHjxxRc/dm5tbW1UV1c32gAAALKR0+K0fv36qKuri+Li4kbjxcXFUVVVtd19+vXrF7Nnz44HHngg7rzzzqivr4+jjz46Vq1atd35U6ZMiaKiooatV69eTf45AACAPdtutxx5aWlpjB49OgYPHhwjRoyIe++9N7p16xY//elPtzt/woQJsXHjxoZt5cqVLZwYAADY3eX0HqeuXbtGXl5erFmzptH4mjVrdvgepnbt2sURRxwRr7766nafLygoiIKCgl3OCgAA7L1yesYpPz8/hgwZEpWVlQ1j9fX1UVlZGaWlpTv0GnV1dbF06dLYf//9mysmAACwl8v5qnoVFRUxZsyYGDp0aAwbNiymT58eNTU1DavsjR49Onr27BlTpkyJiIgrr7wyjjrqqOjbt29s2LAhrrvuuvjLX/4SX//613P5MQAAgD1YzovTqFGjYt26dTFx4sSoqqqKwYMHx7x58xoWjFixYkWjX/F99913Y9y4cVFVVRWdO3eOIUOGxFNPPRUDBgzI1UcAAAD2cJkkSZJch2hJ1dXVUVRUFBs3bvQ7TgAAsBfLphvsdqvqAQAAtDTFCQAAIIXiBAAAkEJxAgAASKE4AQAApFCcAAAAUihOAAAAKRQnAACAFIoTAABAira5DkDrlyRJ1NTUNDzed999I5PJ5DARAAC0LMWJVDU1NTFy5MiGxw888EB06NAhh4kAAKBluVQPAAAgheIEAACQQnECAABIoTgBAACkUJwAAABSKE4AAAApFCcAAIAUihMAAEAKP4ALtBpJkkRNTU3D43333TcymUwOEwEA/I3iBLQaNTU1MXLkyIbHDzzwQHTo0CGHiQAA/kZx2glDLv1FriO0qMwHW6LoI4+P//6cSNrm5yxPS1t03ehcRwAAIMfc4wQAAJBCcQIAAEihOAEAAKRQnAAAAFIoTgAAACmsqgetmBUcreAIALQOzjgBAACkUJwAAABSKE4AAAApFCcAAIAUihMAAEAKq+qRKslrFxsHnt3oMQAA7E0UJ9JlMnvVktAAAPCPXKoHAACQwhknoNVwWSgA0FopTkDr4bJQAKCVcqkeAABACsUJAAAgheIEAACQwj1OAADNJEmSqKmpaXi87777RiaTyWEiYGcpTgAAzaSmpiZGjhzZ8PiBBx6IDh065DARsLNcqgcAAJDCGScA9jounwIgW4oTAHsdl08BkC2X6gEAAKRQnAAAAFK4VA8AaDFDLv1FriO0qMwHW6LoI4+P//6cSNrm5yxPS1t03ehcR4Am44wTAABACsUJAAAghUv1AABgN+dnFpqf4gSA+07cdwLs5vzMQvNzqR4AAEAKxQkAACCF4gQAAJDCPU4AAM0kyWsXGwee3egxsHtSnAAAmksms1ctPAJ7MpfqAQAApHDGCQCAPY6fWfAzC01NcQJgr+O+EwCypTgBsPdx3wkAWXKPEwAAQArFCQAAIIXiBAAAkEJxAgAASKE4AQAApLCqHgAA7Ob8zELzU5wAAGB352cWmp1L9QAAAFIoTgAAACkUJwAAgBSKEwAAQArFCQAAIIXiBAAAkEJxAgAASKE4AQAApFCcAAAAUihOAAAAKRQnAACAFIoTAABAilZRnGbMmBG9e/eOwsLCGD58eDz77LM7tN+cOXMik8nE6aef3rwBAQCAvVrOi9PcuXOjoqIiJk2aFIsXL45BgwZFeXl5rF279hP3W758eVxyySVx7LHHtlBSAABgb5Xz4jRt2rQYN25cjB07NgYMGBAzZ86M9u3bx+zZsz92n7q6uvjKV74SkydPjj59+nzi69fW1kZ1dXWjDQAAIBs5LU5btmyJRYsWRVlZWcNYmzZtoqysLBYuXPix+1155ZXRvXv3OO+881LfY8qUKVFUVNSw9erVq0myAwAAe4+cFqf169dHXV1dFBcXNxovLi6Oqqqq7e6zYMGCmDVrVtxyyy079B4TJkyIjRs3NmwrV67c5dwAAMDepW2uA2Rj06ZNcc4558Qtt9wSXbt23aF9CgoKoqCgoJmTAQAAe7KcFqeuXbtGXl5erFmzptH4mjVroqSkZJv5r732WixfvjxOPfXUhrH6+vqIiGjbtm0sW7YsDj744OYNDQAA7HVyeqlefn5+DBkyJCorKxvG6uvro7KyMkpLS7eZ379//1i6dGk8//zzDdtpp50WJ5xwQjz//PPuXwIAAJpFzi/Vq6ioiDFjxsTQoUNj2LBhMX369KipqYmxY8dGRMTo0aOjZ8+eMWXKlCgsLIzDDjus0f777bdfRMQ24wAAAE0l58Vp1KhRsW7dupg4cWJUVVXF4MGDY968eQ0LRqxYsSLatMn5qukAAMBeLOfFKSJi/PjxMX78+O0+N3/+/E/c9/bbb2/6QAAAAB/hVA4AAEAKxQkAACCF4gQAAJBCcQIAAEihOAEAAKRQnAAAAFIoTgAAACkUJwAAgBSKEwAAQArFCQAAIIXiBAAAkEJxAgAASKE4AQAApFCcAAAAUihOAAAAKRQnAACAFIoTAABACsUJAAAgheIEAACQQnECAABIoTgBAACkUJwAAABSKE4AAAApFCcAAIAUihMAAEAKxQkAACCF4gQAAJBCcQIAAEihOAEAAKRQnAAAAFIoTgAAACkUJwAAgBSKEwAAQArFCQAAIIXiBAAAkEJxAgAASKE4AQAApFCcAAAAUihOAAAAKRQnAACAFIoTAABACsUJAAAgheIEAACQQnECAABIoTgBAACkUJwAAABSKE4AAAApFCcAAIAUihMAAEAKxQkAACCF4gQAAJBCcQIAAEihOAEAAKRQnAAAAFIoTgAAACkUJwAAgBSKEwAAQArFCQAAIIXiBAAAkEJxAgAASKE4AQAApFCcAAAAUihOAAAAKRQnAACAFIoTAABACsUJAAAgheIEAACQIuvilJeXF2vXrt1m/O233468vLwmCQUAANCaZF2ckiTZ7nhtbW3k5+fvciAAAIDWpu2OTvzxj38cERGZTCZuvfXW6NChQ8NzdXV18eSTT0b//v2bPiEAAECO7XBxuvHGGyPib2ecZs6c2eiyvPz8/Ojdu3fMnDmz6RMCAADk2A4XpzfeeCMiIk444YS49957o3Pnzs0WCgAAoDXZ4eL0occff7w5cgAAALRaWRenc8899xOfnz179k6HAQAAaI2yLk7vvvtuo8dbt26NF154ITZs2BCf+9znmiwYAABAa5F1cbrvvvu2Gauvr48LLrggDj744CYJBQAA0Jpk/TtO232RNm2ioqKiYeU9AACAPUmTFKe
|
||
|
"text/plain": [
|
||
|
"<Figure size 1000x600 with 1 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"plt.figure(figsize=(10, 6))\n",
|
||
|
"sns.barplot(x='input1', y='output', data=train_data)\n",
|
||
|
"#plt.title('Average of input1 for each category in output')\n",
|
||
|
"plt.xlabel('input1')\n",
|
||
|
"plt.ylabel('Average of output')\n",
|
||
|
"plt.show()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"id": "db81c7e9-fb3f-42e9-853c-2c2697791f26",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": []
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"kernelspec": {
|
||
|
"display_name": "Python 3 (ipykernel)",
|
||
|
"language": "python",
|
||
|
"name": "python3"
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.10.11"
|
||
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 5
|
||
|
}
|