2013-07-11 21:34:44解讀統計與研究譯者

相關性被低估

相片:GOLF SCORES & PRIZE MONEY
The accompanying photo shows the final 4-day scores and money earned (in US dollars) by 69 pro golfers who made the cut in a recent “major” tournament. What’s the correlation between scores & money earned? It depends on the “tool” used to assess the correlation. Whereas Pearson’s r = –0.62, Spearman’s rho (or Kendall’s tau) = –1.00. These results are negative, as they should be, because low scores earn high payoffs. But there’s a big difference between –0.62 & –1.00. Here, Pearson’s r is too low because it measures LINEAR relationships. Clearly, the golf scores & payoff amounts are curvilinear, not linear.

 source:https://www.facebook.com/readingstatistics

上圖表徵高爾夫杆數與獎金之間的相關 皮爾森相關為 -0.62 負值相關係數是沒有問題的 因為理應杆數越多獎金越低 請仔細看 兩者之間不是線性關係 據此 皮爾森相關係數"低估"了高爾夫杆數與獎金之間的"曲線"性關係

或許上統計課的時候 讀者被暗示相關性研究是最為簡單的概念 也因此忽略了其中一個最為重要的假設前提 那就是"線性"

怎麼判斷線性? 一個可以說最直接的方法就是視覺檢視 承上圖 讀者會發現一條隱隱約約的曲線關係比一條直線關係更能詮釋研究焦點   

當你越來越深入了解各種相關係數時 你會發現要做好一個相關性研究 不是想像中那樣的容易