# Annual Cost of Living Monte Carlo Models

Justin Napolitano

2022-06-01 15:24:32.169 +0000 UTC

# Table of Contents

# Cost of Living Projections

## Introduction

I do not like negotiating for salary. Especially, without valid projections to determine a range.

I prepared this report to estimate a salary expectation that will maintain my current standard of living.

I present two Monte Carlo models of Houston and NYC annual living costs. The data is somewhat dated and –particularly in the case of houston– are high level estimates.

In order to produce a better report, I am currently scraping data from the internet for more accurate sample distributions. I will be able to present that soon.

With that said, the model should not deviate by more than about 5-10 percent from what is presented in below.

## Findings

An annual salary of $90,000 would be sufficient to qualify for rent in Houston and most likely the median level income neighbors of NYC.

I came about this number by quantifying a confidence inverval of annual rent costs in boths cities across a normal distribution. I then simply multiplied that number by 3 in order to meet the lease qualifications of most landlords.

## Limitations of the Model

### Old Nyc Data

The data I am using was sourced from 2018. I will be updating it soon.

### Houston Data

The houston estimate is based an estimate to stay in the property I am currently staying in. The rent is 2400 a month. I estimated that it could raise at maximum to about 2600 in the next year. If I were to move similiar housing goes for around 2200 to about 2600 a month. I used these as the bounds of my estimates

## Houston Cost of Living Expenses

I intend to stay in Houston for the next year. I would like to move to NY eventually to be nearer to a central office, but not in the near future.

```
lower_bound = int(2400)
upper_bound = int(2600)
median = 2500
standard_dev = 100 #file:///Users/jnapolitano/Downloads/LNG_Shipping_a_Descriptive_Analysis.pdf
cap_range = range(lower_bound, upper_bound)
rent_distribution = np.random.normal(loc=median , scale=standard_dev, size=10000)
rent_sample = choice(rent_distribution,12)
```

### Houston Monthly food costs

```
lower_bound = int(300)
upper_bound = int(500)
median = 400
standard_dev = 50
food_range = range(lower_bound, upper_bound)
food_distribution = np.random.normal(loc=median , scale=standard_dev, size=10000)
food_sample = choice(food_distribution, 12)
```

### Houston Insurance Costs

```
lower_bound = int(200)
upper_bound = int(300)
median = 250
standard_dev = 25
insurance_range = range(lower_bound, upper_bound)
insurance_distribution = np.random.normal(loc=median , scale=standard_dev, size=10000)
```

#### The Houston Cost of Living DF

```
cost_of_living_df = pd.DataFrame()
cost_of_living_df['rent']= choice(rent_distribution,12)
cost_of_living_df['food'] = choice(food_distribution, 12)
cost_of_living_df['insurance'] = choice(insurance_distribution, 12)
cost_of_living_df['monthly_cost'] = cost_of_living_df.rent + cost_of_living_df.food + cost_of_living_df.insurance
cost_of_living_df
```

rent | food | insurance | monthly_cost | |
---|---|---|---|---|

0 | 2472.688851 | 334.419350 | 231.162225 | 3038.270426 |

1 | 2399.284893 | 444.677340 | 248.645107 | 3092.607340 |

2 | 2684.456976 | 430.277801 | 252.578613 | 3367.313390 |

3 | 2478.390464 | 360.661703 | 291.989836 | 3131.042002 |

4 | 2513.324309 | 429.771020 | 252.866861 | 3195.962190 |

5 | 2501.390892 | 413.121444 | 243.717854 | 3158.230190 |

6 | 2554.433859 | 363.994333 | 226.672435 | 3145.100627 |

7 | 2530.369935 | 299.997467 | 239.663510 | 3070.030911 |

8 | 2635.681318 | 394.667441 | 241.502045 | 3271.850803 |

9 | 2596.457738 | 513.944623 | 229.362551 | 3339.764912 |

10 | 2455.017883 | 371.266360 | 283.637179 | 3109.921421 |

11 | 2427.449703 | 485.960065 | 276.488430 | 3189.898198 |

## Houston Costs Per Annum Algorithm

The algorithm below calculates the annual cost of rent, food, and insurance to determine total cost per year. Rent, food, and insurance are set by random choice based on the distributions defined in the functions above.

I run the simulation 10,000 times which in theory corresponds to 10,000 random samples of annual costs. The point in doing this is to create a random normal distribution to define convidence intervals of my total annual costs.

```
years = 10000
year_counter = 0
#carbon_total_millions_metric_tons = 300000000
#total_tons_shipped = 0
total_price = 0
cycle_price_samples = np.zeros(shape=years)
cycle_rent_samples = np.zeros(shape=years)
cycle_food_samples = np.zeros(shape=years)
cycle_insurance_samples = np.zeros(shape=years)
annual_cost = 0
for year in range(years):
# Define a New DataFrame. It should fall out of scope with each iteration
cost_of_living_df = pd.DataFrame()
#random choice of rent
cost_of_living_df['rent']= choice(rent_distribution,12)
#random choice of food
cost_of_living_df['food'] = choice(food_distribution, 12)
#random Choice of Insurance
cost_of_living_df['insurance'] = choice(insurance_distribution, 12)
#Random Choice of total annual cost
cost_of_living_df['monthly_cost'] = cost_of_living_df.rent + cost_of_living_df.food + cost_of_living_df.insurance
# must use apply to account for multiple 0 conditions. If i simply vectorized the function across the dataframe in a single call i would assign the the same values each day
#calculate cost per day for fun...
# query all that are = o. Summate the capacities deduct the total
annual_cost = cost_of_living_df['monthly_cost'].sum()
annual_rent = cost_of_living_df.rent.sum()
annual_food = cost_of_living_df.food.sum()
annual_insurance = cost_of_living_df.insurance.sum()
cycle_price_samples[year] = annual_cost
cycle_food_samples[year] = annual_food
cycle_insurance_samples[year] = annual_insurance
cycle_rent_samples[year] = annual_rent
#print(carbon_total_millions_metric_tons)
year_counter = year_counter+1
```

### Houston Prediction Df

```
prediction_df = pd.DataFrame()
prediction_df['rent'] = cycle_rent_samples
prediction_df['food'] = cycle_food_samples
prediction_df['insurance'] = cycle_insurance_samples
prediction_df['total'] = cycle_price_samples
```

```
prediction_df.describe()
```

rent | food | insurance | total | |
---|---|---|---|---|

count | 10000.000000 | 10000.000000 | 10000.000000 | 10000.000000 |

mean | 30003.016272 | 4800.864106 | 2997.910667 | 37801.791045 |

std | 344.473477 | 171.736899 | 86.991071 | 394.976839 |

min | 28586.298471 | 4159.970425 | 2699.038887 | 36163.596078 |

25% | 29771.562236 | 4683.226307 | 2940.117598 | 37537.005225 |

50% | 30003.442289 | 4800.664909 | 2997.584664 | 37797.598919 |

75% | 30234.927776 | 4915.307716 | 3056.853675 | 38072.961560 |

max | 31370.239418 | 5495.020896 | 3314.016695 | 39469.935965 |

### Houston Annual Cost Histogram

```
prediction_df.total.plot.hist(grid=True, bins=20, rwidth=0.9,
color='#607c8e')
plt.xlabel('Annual Total Costs Price USD')
plt.ylabel('Frequency')
plt.grid(axis='y', alpha=0.75)
```

### Houston: Calculating the Confidence Interval For Total Costs

The data is nearly normal. Greater samples sizes would produce a graph of nearly perfect normality

```
st.norm.interval(alpha=0.90, loc=np.mean(prediction_df.total), scale=st.sem(prediction_df.total))
```

```
(37795.2942543157, 37808.287836034055)
```

### Houston Annual Rent Histogram

```
### Annual Cost Histogram Histogram
prediction_df.rent.plot.hist(grid=True, bins=20, rwidth=0.9,
color='#607c8e')
plt.title('Annual Rent Cost Distribution ')
plt.xlabel('Annual Rent Costs Price USD')
plt.ylabel('Frequency')
plt.grid(axis='y', alpha=0.75)
```

### Houston: Calculating the Confidence Interval For Annual Rent

The data is nearly normal. Greater samples sizes would produce a graph of nearly perfect normality

```
st.norm.interval(alpha=0.95, loc=np.mean(prediction_df.rent), scale=st.sem(prediction_df.rent))
```

```
(29996.264715447538, 30009.767827637417)
```

## New York Cost of Living Expenses

For the sake of comparison, the New York Expense distributions are calculated below. I assume that everything but rent will be equivalent to Houston. A more accurate model would account for insurance, food, and incidental differences.

I am assuming the rent of a two bedroom apartment.

The data i am using was scraped from craigslist in 2018. I will redo it later for 2022 data to get a better model.

```
nyc_df = pd.read_csv("/Users/jnapolitano/Projects/cost-of-living-projections/nyc-housing.csv", encoding="unicode-escape")
```

```
#assuiming a two bedroom
nyc_df = nyc_df[nyc_df['Bedrooms']== '2br']
```

```
nyc_df.describe()
```

Zipcode | Price | |
---|---|---|

count | 2626.000000 | 2625.000000 |

mean | 10845.203351 | 2755.018286 |

std | 556.758722 | 7465.827048 |

min | 10001.000000 | 16.000000 |

25% | 10065.000000 | 1950.000000 |

50% | 11210.000000 | 2330.000000 |

75% | 11231.000000 | 2922.000000 |

max | 11697.000000 | 378888.000000 |

The price is about 2800 with a std of 7,465. Which is absurd. To do a better analysis, I need to clean the data.

```
idx = (nyc_df.Price > 500) & (nyc_df.Price < 4500)
nyc_df = nyc_df[idx]
```

```
nyc_df.describe()
```

Zipcode | Price | |
---|---|---|

count | 2441.000000 | 2441.00000 |

mean | 10881.331422 | 2435.25891 |

std | 541.102216 | 728.96291 |

min | 10001.000000 | 600.00000 |

25% | 10302.000000 | 1950.00000 |

50% | 11211.000000 | 2300.00000 |

75% | 11233.000000 | 2750.00000 |

max | 11697.000000 | 4495.00000 |

When accounting for outliers the data is far more managable. I’m surprised by the mean price. Again this data is old, but it is also does not accout for neighborhoods. I will redo the analysis at a later data filtered by neighborhoods.

### Creating the NYC Distributions

```
lower_bound = int(600)
upper_bound = int(4500)
median = 2435
standard_dev = 729
cap_range = range(lower_bound, upper_bound)
rent_distribution = np.random.normal(loc=median , scale=standard_dev, size=10000)
rent_sample = choice(rent_distribution,12)
```

### NYC Monthly food costs

```
lower_bound = int(300)
upper_bound = int(500)
median = 400
standard_dev = 50
food_range = range(lower_bound, upper_bound)
food_distribution = np.random.normal(loc=median , scale=standard_dev, size=10000)
food_sample = choice(food_distribution, 12)
```

### NYC Insurance Costs

```
lower_bound = int(200)
upper_bound = int(300)
median = 250
standard_dev = 25
insurance_range = range(lower_bound, upper_bound)
insurance_distribution = np.random.normal(loc=median , scale=standard_dev, size=10000)
```

#### NYC Cost of Living Distribution

```
cost_of_living_df = pd.DataFrame()
cost_of_living_df['rent']= choice(rent_distribution,12)
cost_of_living_df['food'] = choice(food_distribution, 12)
cost_of_living_df['insurance'] = choice(insurance_distribution, 12)
cost_of_living_df['monthly_cost'] = cost_of_living_df.rent + cost_of_living_df.food + cost_of_living_df.insurance
cost_of_living_df
```

rent | food | insurance | monthly_cost | |
---|---|---|---|---|

0 | 2440.594149 | 404.104193 | 263.802114 | 3108.500457 |

1 | 3509.157666 | 399.234822 | 206.641152 | 4115.033640 |

2 | 3351.649621 | 297.314475 | 284.177204 | 3933.141300 |

3 | 1977.607960 | 359.872656 | 255.831381 | 2593.311996 |

4 | 2169.224724 | 386.271512 | 244.469415 | 2799.965652 |

5 | 2661.843885 | 356.660878 | 218.425732 | 3236.930495 |

6 | 3595.833071 | 385.012912 | 273.882653 | 4254.728637 |

7 | 1765.419028 | 404.770447 | 236.665360 | 2406.854835 |

8 | 1708.955308 | 348.178355 | 231.690103 | 2288.823766 |

9 | 3227.258413 | 392.787025 | 252.315570 | 3872.361007 |

10 | 1941.492537 | 404.384587 | 247.628257 | 2593.505381 |

11 | 2081.218740 | 416.678465 | 213.204362 | 2711.101567 |

## NYC Costs Per Annum Algorithm

The algorithm below calculates the annual cost of rent, food, and insurance to determine total cost per year. Rent, food, and insurance are set by random choice based on the distributions defined in the functions above.

I run the simulation 10,000 times which in theory corresponds to 10,000 random samples of annual costs. The point in doing this is to create a random normal distribution to define convidence intervals of my total annual costs.

```
years = 10000
year_counter = 0
#carbon_total_millions_metric_tons = 300000000
#total_tons_shipped = 0
total_price = 0
cycle_price_samples = np.zeros(shape=years)
cycle_rent_samples = np.zeros(shape=years)
cycle_food_samples = np.zeros(shape=years)
cycle_insurance_samples = np.zeros(shape=years)
annual_cost = 0
for year in range(years):
# Define a New DataFrame. It should fall out of scope with each iteration
cost_of_living_df = pd.DataFrame()
#random choice of rent
cost_of_living_df['rent']= choice(rent_distribution,12)
#random choice of food
cost_of_living_df['food'] = choice(food_distribution, 12)
#random Choice of Insurance
cost_of_living_df['insurance'] = choice(insurance_distribution, 12)
#Random Choice of total annual cost
cost_of_living_df['monthly_cost'] = cost_of_living_df.rent + cost_of_living_df.food + cost_of_living_df.insurance
# must use apply to account for multiple 0 conditions. If i simply vectorized the function across the dataframe in a single call i would assign the the same values each day
#calculate cost per day for fun...
# query all that are = o. Summate the capacities deduct the total
annual_cost = cost_of_living_df['monthly_cost'].sum()
annual_rent = cost_of_living_df.rent.sum()
annual_food = cost_of_living_df.food.sum()
annual_insurance = cost_of_living_df.insurance.sum()
cycle_price_samples[year] = annual_cost
cycle_food_samples[year] = annual_food
cycle_insurance_samples[year] = annual_insurance
cycle_rent_samples[year] = annual_rent
#print(carbon_total_millions_metric_tons)
year_counter = year_counter+1
```

### NYC Prediction Df

```
prediction_df = pd.DataFrame()
prediction_df['rent'] = cycle_rent_samples
prediction_df['food'] = cycle_food_samples
prediction_df['insurance'] = cycle_insurance_samples
prediction_df['total'] = cycle_price_samples
```

```
prediction_df.describe()
```

rent | food | insurance | total | |
---|---|---|---|---|

count | 10000.000000 | 10000.000000 | 10000.000000 | 10000.000000 |

mean | 29219.509691 | 4797.809482 | 3004.224643 | 37021.543816 |

std | 2532.300418 | 172.673041 | 87.221734 | 2542.267617 |

min | 18744.517281 | 4116.639699 | 2574.323735 | 26447.949901 |

25% | 27545.387716 | 4678.877662 | 2945.270499 | 35351.052672 |

50% | 29244.878069 | 4797.251203 | 3005.337764 | 37034.425389 |

75% | 30915.545611 | 4915.266687 | 3062.210984 | 38722.269645 |

max | 38516.336096 | 5429.519670 | 3327.233629 | 46383.324453 |

### NYC Annual Cost Histogram

```
prediction_df.total.plot.hist(grid=True, bins=20, rwidth=0.9,
color='#607c8e')
plt.xlabel('Annual Total Costs Price USD')
plt.ylabel('Frequency')
plt.grid(axis='y', alpha=0.75)
```

### NYC: Calculating the Confidence Interval For Total Costs

The data is nearly normal. Greater samples sizes would produce a graph of nearly perfect normality

```
st.norm.interval(alpha=0.90, loc=np.mean(prediction_df.total), scale=st.sem(prediction_df.total))
```

```
(36979.727235126586, 37063.36039733022)
```

### NYC Annual Rent Histogram

```
### Annual Cost Histogram Histogram
prediction_df.rent.plot.hist(grid=True, bins=20, rwidth=0.9,
color='#607c8e')
plt.title('Annual Rent Cost Distribution ')
plt.xlabel('Annual Rent Costs Price USD')
plt.ylabel('Frequency')
plt.grid(axis='y', alpha=0.75)
```

### Calculating the Confidence Interval For Annual Rent

The data is nearly normal. Greater samples sizes would produce a graph of nearly perfect normality

```
st.norm.interval(alpha=0.95, loc=np.mean(prediction_df.rent), scale=st.sem(prediction_df.rent))
```

```
(29169.877514702926, 29269.14186706609)
```

## NYC Closing Remarks

The rent distribution in NYC with 2018 data is actually nearly comparible to my houston estimate. An annual salary of 90,000 would permit me to live at about the median level in the city. I will be redoing this report soon as the data is old. I am currently scraping data in houston and nyc to produce a better analysis.

## Imports

```
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as st
from shapely.geometry import Point
from numpy.random import choice
import warnings
warnings.filterwarnings('ignore')
```