Skip to main content

Rice Paddy Methane Emissions Estimation: Part 2

banner-image

Justin Napolitano

2022-05-23 19:30:32 +0000 UTC


Table of Contents


Series

This is a post in the Rice Paddy Methane Emissions series.
Other posts in this series:

  • Rice Paddy Methane Emissions Estimation: Part 1

  • Methane Emissions Estimation Data Part 2: A Comparison between FAOSTAT and University of Malaysia Estimates

    This post documents the data exploration phase of a project that determines whether global methane emissions produced by rice paddies are undercounted.

    It is fairly code python and pandas heavy.

    The code and data exploration follows the summary below.

    Hypothesis Testing the University of Malaysia Paper

    Claims

    • That the distributions do not differ between 2020 and 2019
    • That the means do no differ between 2020 and 2019

    What will be Tested.

    • Shapiro-Wilk Test
    • Mann-Whitney U Test
    • Kruskal Wallis
    • Friedman

    Analysis

    import pandas as pd
    import matplotlib.pyplot as plt
    import numpy as np
    import scipy.stats as stats
    
    filepath = "/Users/jnapolitano/Projects/wattime-takehome/wattime-takehome/data/ch4_2015-2021.xlsx"
    
    hypothesis_testing_df = pd.read_excel(filepath)
    

    Drop total row from the data

    hypothesis_testing_df = hypothesis_testing_df.loc[(hypothesis_testing_df['country_name'] != "Total")].copy() #copying to avoid modifying slices in memory.  Old df should also drop from memory in production environment.
    
    hypothesis_testing_df
    

    iso3_country country_name tCH4_2015 tCH4_2016 tCH4_2017 tCH4_2018 tCH4_2019 tCH4_2020 tCH4_2021
    0 BGD Bangladesh 2.344420e+06 2.278158e+06 2.098958e+06 2.141231e+06 2.070985e+06 2.106781e+06 1.983974e+06
    1 BRA Brazil 3.410233e+05 3.104189e+05 3.725173e+05 3.717030e+05 3.294713e+05 4.902874e+05 4.544874e+05
    2 CHN China 6.133647e+06 5.859531e+06 6.355071e+06 5.413962e+06 5.603352e+06 6.402353e+06 6.068210e+06
    3 ESP Spain 1.141464e+04 1.334803e+04 1.217299e+04 1.405410e+04 1.148324e+04 1.305461e+04 8.531579e+03
    4 IDN Indonesia 1.283649e+06 1.023129e+06 9.615327e+05 1.176982e+06 1.266668e+06 1.188195e+06 1.009936e+06
    5 IND India 6.219887e+06 5.309413e+06 6.228451e+06 6.589798e+06 7.501556e+06 7.599764e+06 6.567960e+06
    6 IRN Iran (Islamic Republic of) 8.774407e+04 9.180121e+04 9.620217e+04 8.875744e+04 9.500199e+04 9.600254e+04 9.053525e+04
    7 ITA Italy 4.995968e+04 4.937785e+04 5.443679e+04 4.469902e+04 4.566914e+04 5.101547e+04 5.089759e+04
    8 JPN Japan 2.305465e+05 2.284133e+05 2.708935e+05 1.548252e+05 2.332056e+05 2.835167e+05 1.574007e+05
    9 KHM Cambodia 4.954698e+05 5.731698e+05 4.517045e+05 5.592610e+05 5.947277e+05 6.412802e+05 5.644891e+05
    10 KOR Korea (the Republic of) 1.451878e+05 1.274597e+05 1.463222e+05 1.293543e+05 1.327782e+05 1.165467e+05 1.013006e+05
    11 LAO Lao People's Democratic Republic (the) 1.661169e+04 1.696441e+04 1.168063e+04 1.009675e+04 1.461058e+04 2.136270e+04 1.475014e+04
    12 LKA Sri Lanka 8.305626e+04 1.011743e+05 5.911841e+04 9.018914e+04 8.476088e+04 9.248238e+04 8.466966e+04
    13 MMR Myanmar 1.132082e+06 1.290806e+06 1.205169e+06 1.372447e+06 1.256888e+06 1.221904e+06 1.289837e+06
    14 MYS Malaysia 1.057399e+05 1.110049e+05 1.111291e+05 1.066525e+05 1.056287e+05 1.127141e+05 1.069696e+05
    15 NPL Nepal 1.007479e+05 6.667161e+04 8.081300e+04 9.200752e+04 1.164235e+05 7.168401e+04 4.811408e+04
    16 PAK Pakistan 4.852431e+05 5.945922e+05 5.372641e+05 4.532297e+05 6.528548e+05 6.401201e+05 4.849205e+05
    17 PHL Philippines (the) 3.432021e+05 4.073554e+05 3.836830e+05 4.175210e+05 3.584550e+05 4.462836e+05 4.383270e+05
    18 PRK Korea (the Democratic People's Republic of) 1.143217e+05 9.177653e+04 1.085457e+05 8.662578e+04 9.655062e+04 8.581038e+04 7.735988e+04
    19 THA Thailand 1.393798e+06 1.780993e+06 1.164699e+06 9.166575e+05 1.305046e+06 1.520788e+06 8.528673e+05
    20 TWN Taiwan (Province of China) 7.866956e+04 8.089149e+04 8.705634e+04 8.138151e+04 8.990870e+04 8.333327e+04 6.619861e+04
    21 USA United States of America (the) 1.611324e+05 1.618576e+05 1.684799e+05 1.657254e+05 1.691351e+05 1.941455e+05 1.634842e+05
    22 VNM Viet Nam 1.346013e+06 1.483777e+06 1.406437e+06 1.317455e+06 1.269751e+06 1.374450e+06 1.502787e+06

    Test for Normality: Shapiro-Wilk

    2019

    ## Selecting Malaysia 2019 Data 
    data_2019 = hypothesis_testing_df['tCH4_2019']
    data_2019
    
    0     2.070985e+06
    1     3.294713e+05
    2     5.603352e+06
    3     1.148324e+04
    4     1.266668e+06
    5     7.501556e+06
    6     9.500199e+04
    7     4.566914e+04
    8     2.332056e+05
    9     5.947277e+05
    10    1.327782e+05
    11    1.461058e+04
    12    8.476088e+04
    13    1.256888e+06
    14    1.056287e+05
    15    1.164235e+05
    16    6.528548e+05
    17    3.584550e+05
    18    9.655062e+04
    19    1.305046e+06
    20    8.990870e+04
    21    1.691351e+05
    22    1.269751e+06
    Name: tCH4_2019, dtype: float64
    
    results = stats.shapiro(data_2019)
    print('stat=%.3f, p=%.3f' % (results.statistic, results.pvalue))
    if results.pvalue > 0.05:
    	print('Probably Gaussian')
    else:
    	print('Probably not Gaussian')
    
    stat=0.567, p=0.000
    Probably not Gaussian
    
    Results

    The distribution is not gausian so a non-paremtric test must be completed. It is not necessary to perform this test on the 2020 data, but I will do so anyways for practice.

    2020

    ## Selecting the Malaysia Data 2020
    data_2020 = hypothesis_testing_df['tCH4_2020']
    
    results = stats.shapiro(data_2020)
    print('stat=%.3f, p=%.3f' % (results.statistic, results.pvalue))
    if results.pvalue > 0.05:
    	print('Probably Gaussian')
    else:
    	print('Probably not Gaussian')
    
    stat=0.565, p=0.000
    Probably not Gaussian
    
    Results

    The 2020 data is not gausian which verifies that we will need to perform a non parmetric test

    Independence of Samples.

    We have to assume that the samples are independent of each other as we know they are dependent on hecatares.
    Though the correlations are rather high this is due to the smiliarity of hectares per year. Thus the amount of ch4 is similiar

    Distribution Similiarity

    Mann-Whitney U Test

    # Example of the Mann-Whitney U Test
    
    stat, p = stats.mannwhitneyu(data_2019, data_2020)
    print('stat=%.3f, p=%.3f' % (stat, p))
    if p > 0.05:
    	print('Probably the same distribution')
    else:
    	print('Probably different distributions')
    
    stat=266.000, p=0.982
    Probably the same distribution
    

    Kruskal Wallis test

    
    stat, p = stats.kruskal(data_2019, data_2020)
    print('stat=%.3f, p=%.3f' % (stat, p))
    if p > 0.05:
    	print('Probably the same distribution')
    else:
    	print('Probably different distributions')
    
    stat=0.001, p=0.974
    Probably the same distribution
    

    Friedman Test

    Just for the sake of it I will compare data across all distributions

    # Example of the Friedman Test
    #data_2014 = hypothesis_testing_df['tCH4_2014']
    data_2015 = hypothesis_testing_df['tCH4_2015']
    data_2016 = hypothesis_testing_df['tCH4_2016']
    data_2017 = hypothesis_testing_df['tCH4_2017']
    data_2018 = hypothesis_testing_df['tCH4_2018']
    
    stat, p = stats.friedmanchisquare(data_2015, data_2016, data_2017, data_2018, data_2019, data_2020)
    print('stat=%.3f, p=%.3f' % (stat, p))
    if p > 0.05:
    	print('Probably the same distribution')
    else:
    	print('Probably different distributions')
    
    stat=11.472, p=0.043
    Probably different distributions
    

    Results.

    Some distributions differ from one another. Which those are have yet to be discovered. For the sake of this analysis I will not attempt to identify them.

    The statment that the distributions of the 2019 and 2020 data do not differ cannot differ. That said we also cannot claim that the means are statistically equivalent as the data is not parametric.


    Series

    This is a post in the Rice Paddy Methane Emissions series.
    Other posts in this series:

  • Rice Paddy Methane Emissions Estimation: Part 1

  • comments powered by Disqus