Python如何檢驗(yàn)樣本是否服從正態(tài)分布
在進(jìn)行t檢驗(yàn)、F檢驗(yàn)之前,我們往往要求樣本大致服從正態(tài)分布,下面介紹兩種檢驗(yàn)樣本是否服從正態(tài)分布的方法。
可視化
我們可以通過將樣本可視化,看一下樣本的概率密度是否是正態(tài)分布來初步判斷樣本是否服從正態(tài)分布。
代碼如下:
import numpy as np import pandas as pd import matplotlib.pyplot as plt # 使用pandas和numpy生成一組仿真數(shù)據(jù) s = pd.DataFrame(np.random.randn(500),columns=['value']) print(s.shape) # (500, 1) # 創(chuàng)建自定義圖像 fig = plt.figure(figsize=(10, 6)) # 創(chuàng)建子圖1 ax1 = fig.add_subplot(2,1,1) # 繪制散點(diǎn)圖 ax1.scatter(s.index, s.values) plt.grid() # 添加網(wǎng)格 # 創(chuàng)建子圖2 ax2 = fig.add_subplot(2, 1, 2) # 繪制直方圖 s.hist(bins=30,alpha=0.5,ax=ax2) # 繪制密度圖 s.plot(kind='kde', secondary_y=True,ax=ax2) # 使用雙坐標(biāo)軸 plt.grid() # 添加網(wǎng)格 # 顯示自定義圖像 plt.show()
可視化圖像如下:
從圖中可以初步看出生成的數(shù)據(jù)近似服從正態(tài)分布。
為了得到更具說服力的結(jié)果,我們可以使用統(tǒng)計(jì)檢驗(yàn)的方法,這里使用的是.scipy.stats中的函數(shù)。
統(tǒng)計(jì)檢驗(yàn)
1)kstest
scipy.stats.kstest函數(shù)可用于檢驗(yàn)樣本是否服從正態(tài)、指數(shù)、伽馬等分布,函數(shù)的源代碼為:
def kstest(rvs, cdf, args=(), N=20, alternative='two-sided', mode='approx'): """ Perform the Kolmogorov-Smirnov test for goodness of fit. This performs a test of the distribution F(x) of an observed random variable against a given distribution G(x). Under the null hypothesis the two distributions are identical, F(x)=G(x). The alternative hypothesis can be either 'two-sided' (default), 'less' or 'greater'. The KS test is only valid for continuous distributions. Parameters ---------- rvs : str, array or callable If a string, it should be the name of a distribution in `scipy.stats`. If an array, it should be a 1-D array of observations of random variables. If a callable, it should be a function to generate random variables; it is required to have a keyword argument `size`. cdf : str or callable If a string, it should be the name of a distribution in `scipy.stats`. If `rvs` is a string then `cdf` can be False or the same as `rvs`. If a callable, that callable is used to calculate the cdf. args : tuple, sequence, optional Distribution parameters, used if `rvs` or `cdf` are strings. N : int, optional Sample size if `rvs` is string or callable. Default is 20. alternative : {'two-sided', 'less','greater'}, optional Defines the alternative hypothesis (see explanation above). Default is 'two-sided'. mode : 'approx' (default) or 'asymp', optional Defines the distribution used for calculating the p-value. - 'approx' : use approximation to exact distribution of test statistic - 'asymp' : use asymptotic distribution of test statistic Returns ------- statistic : float KS test statistic, either D, D+ or D-. pvalue : float One-tailed or two-tailed p-value.
2)normaltest
scipy.stats.normaltest函數(shù)專門用于檢驗(yàn)樣本是否服從正態(tài)分布,函數(shù)的源代碼為:
def normaltest(a, axis=0, nan_policy='propagate'): """ Test whether a sample differs from a normal distribution. This function tests the null hypothesis that a sample comes from a normal distribution. It is based on D'Agostino and Pearson's [1]_, [2]_ test that combines skew and kurtosis to produce an omnibus test of normality. Parameters ---------- a : array_like The array containing the sample to be tested. axis : int or None, optional Axis along which to compute test. Default is 0. If None, compute over the whole array `a`. nan_policy : {'propagate', 'raise', 'omit'}, optional Defines how to handle when input contains nan. 'propagate' returns nan, 'raise' throws an error, 'omit' performs the calculations ignoring nan values. Default is 'propagate'. Returns ------- statistic : float or array ``s^2 + k^2``, where ``s`` is the z-score returned by `skewtest` and ``k`` is the z-score returned by `kurtosistest`. pvalue : float or array A 2-sided chi squared probability for the hypothesis test.
3)shapiro
scipy.stats.shapiro函數(shù)也是用于專門做正態(tài)檢驗(yàn)的,函數(shù)的源代碼為:
def shapiro(x): """ Perform the Shapiro-Wilk test for normality. The Shapiro-Wilk test tests the null hypothesis that the data was drawn from a normal distribution. Parameters ---------- x : array_like Array of sample data. Returns ------- W : float The test statistic. p-value : float The p-value for the hypothesis test.
下面我們使用第一部分生成的仿真數(shù)據(jù),用這三種統(tǒng)計(jì)檢驗(yàn)函數(shù)檢驗(yàn)生成的樣本是否服從正態(tài)分布(p > 0.05),代碼如下:
import numpy as np import pandas as pd import matplotlib.pyplot as plt # 使用pandas和numpy生成一組仿真數(shù)據(jù) s = pd.DataFrame(np.random.randn(500),columns=['value']) print(s.shape) # (500, 1) # 計(jì)算均值 u = s['value'].mean() # 計(jì)算標(biāo)準(zhǔn)差 std = s['value'].std() # 計(jì)算標(biāo)準(zhǔn)差 print('scipy.stats.kstest統(tǒng)計(jì)檢驗(yàn)結(jié)果:----------------------------------------------------') print(stats.kstest(s['value'], 'norm', (u, std))) print('scipy.stats.normaltest統(tǒng)計(jì)檢驗(yàn)結(jié)果:----------------------------------------------------') print(stats.normaltest(s['value'])) print('scipy.stats.shapiro統(tǒng)計(jì)檢驗(yàn)結(jié)果:----------------------------------------------------') print(stats.shapiro(s['value']))
統(tǒng)計(jì)檢驗(yàn)結(jié)果如下:
scipy.stats.kstest統(tǒng)計(jì)檢驗(yàn)結(jié)果:----------------------------------------------------
KstestResult(statistic=0.01596290473494305, pvalue=0.9995623150120069)
scipy.stats.normaltest統(tǒng)計(jì)檢驗(yàn)結(jié)果:----------------------------------------------------
NormaltestResult(statistic=0.5561685865675511, pvalue=0.7572329891688141)
scipy.stats.shapiro統(tǒng)計(jì)檢驗(yàn)結(jié)果:----------------------------------------------------
(0.9985257983207703, 0.9540967345237732)
可以看到使用三種方法檢驗(yàn)樣本是否服從正態(tài)分布的結(jié)果中p-value都大于0.05,說明服從原假設(shè),即生成的仿真數(shù)據(jù)服從正態(tài)分布。
總結(jié)
以上為個(gè)人經(jīng)驗(yàn),希望能給大家一個(gè)參考,也希望大家多多支持腳本之家。
相關(guān)文章
使用python把Excel中的數(shù)據(jù)在頁面中可視化
最近學(xué)習(xí)數(shù)據(jù)分析,感覺Python做數(shù)據(jù)分析真的好用,下面這篇文章主要給大家介紹了關(guān)于如何使用python把Excel中的數(shù)據(jù)在頁面中可視化的相關(guān)資料,需要的朋友可以參考下2022-03-03python3 圖片referer防盜鏈的實(shí)現(xiàn)方法
本篇文章主要介紹了python3 圖片referer防盜鏈的實(shí)現(xiàn)方法,小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,也給大家做個(gè)參考。一起跟隨小編過來看看吧2018-03-03Pandas去除重復(fù)項(xiàng)函數(shù)詳解drop_duplicates()
這篇文章主要介紹了Pandas去除重復(fù)項(xiàng)函數(shù)drop_duplicates(),具有很好的參考價(jià)值,希望對大家有所幫助,如有錯(cuò)誤或未考慮完全的地方,望不吝賜教2024-02-02pytorch快速搭建神經(jīng)網(wǎng)絡(luò)_Sequential操作
這篇文章主要介紹了pytorch快速搭建神經(jīng)網(wǎng)絡(luò)_Sequential操作,具有很好的參考價(jià)值,希望對大家有所幫助。一起跟隨小編過來看看吧2020-06-06Python自動(dòng)化完成tb喵幣任務(wù)的操作方法
2019雙十一,tb推出了新的活動(dòng),商店喵幣,看了一下每天都有幾個(gè)任務(wù)來領(lǐng)取喵幣,從而升級店鋪賺錢,然而我既想賺紅包又不想干苦力,遂使用python來進(jìn)行手機(jī)自動(dòng)化操作,需要的朋友跟隨小編一起看看吧2019-10-10pandas.DataFrame刪除/選取含有特定數(shù)值的行或列實(shí)例
今天小編就為大家分享一篇pandas.DataFrame刪除/選取含有特定數(shù)值的行或列實(shí)例,具有很好的參考價(jià)值,希望對大家有所幫助。一起跟隨小編過來看看吧2018-11-11pycharm 如何取消連按兩下shift出現(xiàn)的全局搜索
這篇文章主要介紹了pycharm 如何取消連按兩下shift出現(xiàn)的全局搜索?下面小編就為大家介紹一下解決方法,還等什么?一起跟隨小編過來看看吧2021-01-01Flask框架中密碼的加鹽哈希加密和驗(yàn)證功能的用法詳解
加鹽加密就是在加密時(shí)混入一段隨機(jī)字符串,這段字符串便被稱為"鹽值",這里我們來看一下Python的Flask框架中密碼的加鹽哈希加密和驗(yàn)證功能的用法詳解:2016-06-06