Odds ratio in python

Содержание

How to get odds-ratios and other related features with scikit-learn in Python?
Method 1: Using the StatsModels Library
Step 1: Import Required Libraries
Step 2: Load Data
Step 3: Prepare Data
Step 4: Fit Logistic Regression Model
Step 5: Get Odds-Ratios and Other Related Features
Step 6: Print Results
Method 2: Using the ResearchPy Library
Step 1: Install ResearchPy Library
Step 2: Import Required Libraries
Step 3: Load Data
Step 4: Create Crosstab and Perform Chi-Square Test
Step 5: Obtain Odds Ratio
Step 6: Obtain Other Related Features
Method 3: Using the Pandas Crosstab Function
scipy.stats.contingency.odds_ratio#
Interpretation of Odds Ratio and Fisher’s Exact Test
Calculation of the Odds Ratio and Applying Fisher’s Test on Python
Example Case

Odds ratio is a statistical measure used in epidemiology and medical statistics to describe the relationship between the presence or absence of an attribute and the odds of a particular outcome. In Python, you can use scikit-learn library to calculate odds ratio and other related features, such as contingency tables, confidence intervals, and p-values. Here are some methods to get odds ratios and related features in scikit-learn:

Method 1: Using the StatsModels Library

To get odds-ratios and other related features with Scikit-learn using the StatsModels library, you can follow these steps:

Step 1: Import Required Libraries

import pandas as pd import statsmodels.api as sm from sklearn.linear_model import LogisticRegression

Step 2: Load Data

Step 3: Prepare Data

X = data[['feature1', 'feature2', 'feature3']] y = data['target']

Step 4: Fit Logistic Regression Model

logit_model = sm.Logit(y, X) result = logit_model.fit()

coefficients = result.params odds_ratio = np.exp(coefficients) p_values = result.pvalues

Step 6: Print Results

print("Coefficients:\n", coefficients) print("\nOdds Ratios:\n", odds_ratio) print("\nP-Values:\n", p_values)

The above code will fit a logistic regression model using the StatsModels library and get the odds-ratios and other related features. The coefficients variable will contain the coefficients of the model, odds_ratio will contain the odds-ratios, and p_values will contain the p-values of the model.

Note: make sure to replace data.csv with the name of your data file.

Method 2: Using the ResearchPy Library

ResearchPy is a Python library that simplifies the process of doing statistics in Python. It provides a simple interface to perform statistical tests, such as t-tests, ANOVA, and correlation analysis. In this tutorial, we will learn how to use ResearchPy library to obtain odds ratios and other related features.

Step 1: Install ResearchPy Library

Before we can start using ResearchPy library, we need to install it. We can install it using pip command:

Step 2: Import Required Libraries

Next, we need to import the required libraries. We will be using pandas, numpy, and researchpy libraries for this tutorial.

import pandas as pd import numpy as np import researchpy as rp

Step 3: Load Data

For this tutorial, we will be using the famous Titanic dataset. We will load the data using pandas library.

url = 'https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv' df = pd.read_csv(url)

Step 4: Create Crosstab and Perform Chi-Square Test

We will create a crosstab between two categorical variables, ‘Sex’ and ‘Survived’, and perform a chi-square test to check if there is any significant association between the two variables.

cross_tab = pd.crosstab(df['Sex'], df['Survived']) chi_square = rp.chi_square(cross_tab) chi_square

The output of the above code will show the chi-square test results.

Step 5: Obtain Odds Ratio

We can obtain the odds ratio by using the following code:

odds_ratio = rp.crosstab(cross_tab, test='chi-square').results['OR'] odds_ratio

The output of the above code will show the odds ratio.

We can obtain other related features such as confidence interval, p-value, and degrees of freedom by using the following code:

ci = rp.crosstab(cross_tab, test='chi-square').results['CI (2.5%)'], rp.crosstab(cross_tab, test='chi-square').results['CI (97.5%)'] p_value = rp.crosstab(cross_tab, test='chi-square').results['p-value'] dof = rp.crosstab(cross_tab, test='chi-square').results['df'] print("Confidence Interval: ", ci) print("P-Value: ", p_value) print("Degrees of Freedom: ", dof)

The output of the above code will show the confidence interval, p-value, and degrees of freedom.

In conclusion, we can use ResearchPy library to obtain odds ratios and other related features with ease. By following the above steps, we can perform statistical tests and obtain useful information from our data.

Method 3: Using the Pandas Crosstab Function

To get odds-ratios and other related features with scikit-learn using the Pandas Crosstab function, you can follow these steps:

import pandas as pd from sklearn.feature_selection import chi2

data = pd.read_csv('your_data.csv')

cross_tab = pd.crosstab(data['feature1'], data['feature2'])

Use the chi2 function from scikit-learn to calculate the chi-squared statistic and p-values for each feature:

chi2_stat, p_values = chi2(cross_tab, data['target'])

odds_ratios = np.exp(chi2_stat)

print('Odds ratios:', odds_ratios) print('P-values:', p_values)

Here is the complete code:

import pandas as pd from sklearn.feature_selection import chi2 import numpy as np data = pd.read_csv('your_data.csv') cross_tab = pd.crosstab(data['feature1'], data['feature2']) chi2_stat, p_values = chi2(cross_tab, data['target']) odds_ratios = np.exp(chi2_stat) print('Odds ratios:', odds_ratios) print('P-values:', p_values)

This should output the odds-ratios and p-values for the features you analyzed.

Источник

scipy.stats.contingency.odds_ratio#

A 2×2 contingency table. Elements must be non-negative integers.

kind str, optional

Which kind of odds ratio to compute, either the sample odds ratio ( kind=’sample’ ) or the conditional odds ratio ( kind=’conditional’ ). Default is ‘conditional’ .

Returns : result OddsRatioResult instance

The returned object has two computed attributes:

If kind is ‘sample’ , this is sample (or unconditional) estimate, given by table[0, 0]*table[1, 1]/(table[0, 1]*table[1, 0]) .
If kind is ‘conditional’ , this is the conditional maximum likelihood estimate for the odds ratio. It is the noncentrality parameter of Fisher’s noncentral hypergeometric distribution with the same hypergeometric parameters as table and whose mean is table[0, 0] .

The object has the method confidence_interval that computes the confidence interval of the odds ratio.

The conditional odds ratio was discussed by Fisher (see “Example 1” of [1]). Texts that cover the odds ratio include [2] and [3].

R. A. Fisher (1935), The logic of inductive inference, Journal of the Royal Statistical Society, Vol. 98, No. 1, pp. 39-82.

Breslow NE, Day NE (1980). Statistical methods in cancer research. Volume I — The analysis of case-control studies. IARC Sci Publ. (32):5-338. PMID: 7216345. (See section 4.2.)

H. Sahai and A. Khurshid (1996), Statistics in Epidemiology: Methods, Techniques, and Applications, CRC Press LLC, Boca Raton, Florida.

Berger, Jeffrey S. et al. “Aspirin for the Primary Prevention of Cardiovascular Events in Women and Men: A Sex-Specific Meta-analysis of Randomized Controlled Trials.” JAMA, 295(3):306-313, DOI:10.1001/jama.295.3.306, 2006.

In epidemiology, individuals are classified as “exposed” or “unexposed” to some factor or treatment. If the occurrence of some illness is under study, those who have the illness are often classifed as “cases”, and those without it are “noncases”. The counts of the occurrences of these classes gives a contingency table:

exposed unexposed cases a b noncases c d

The sample odds ratio may be written (a/c) / (b/d) . a/c can be interpreted as the odds of a case occurring in the exposed group, and b/d as the odds of a case occurring in the unexposed group. The sample odds ratio is the ratio of these odds. If the odds ratio is greater than 1, it suggests that there is a positive association between being exposed and being a case.

Interchanging the rows or columns of the contingency table inverts the odds ratio, so it is import to understand the meaning of labels given to the rows and columns of the table when interpreting the odds ratio.

In [4], the use of aspirin to prevent cardiovascular events in women and men was investigated. The study notably concluded:

…aspirin therapy reduced the risk of a composite of cardiovascular events due to its effect on reducing the risk of ischemic stroke in women […]

The article lists studies of various cardiovascular events. Let’s focus on the ischemic stoke in women.

The following table summarizes the results of the experiment in which participants took aspirin or a placebo on a regular basis for several years. Cases of ischemic stroke were recorded:

Aspirin Control/Placebo Ischemic stroke 176 230 No stroke 21035 21018

The question we ask is “Is there evidence that the aspirin reduces the risk of ischemic stroke?”

>>> from scipy.stats.contingency import odds_ratio >>> res = odds_ratio([[176, 230], [21035, 21018]]) >>> res.statistic 0.7646037659999126

For this sample, the odds of getting an ischemic stroke for those who have been taking aspirin are 0.76 times that of those who have received the placebo.

To make statistical inferences about the population under study, we can compute the 95% confidence interval for the odds ratio:

>>> res.confidence_interval(confidence_level=0.95) ConfidenceInterval(low=0.6241234078749812, high=0.9354102892100372)

The 95% confidence interval for the conditional odds ratio is approximately (0.62, 0.94).

The fact that the entire 95% confidence interval falls below 1 supports the authors’ conclusion that the aspirin was associated with a statistically significant reduction in ischemic stroke.

Источник

Interpretation of Odds Ratio and Fisher’s Exact Test

Calculation of the Odds Ratio and Applying Fisher’s Test on Python

When we work on nominal types of data we mostly focus on frequency tables. There aren’t too many statistical methods to deduce the conclusions of what nominal data relay, unlike the numeric data. Methods such as correlations, confidence intervals, mean, median, etc work for numeric data types. Therefore, frequency tables are used to interpret the nominal data. With the help of the frequency table, nominal data can be interpreted by considering the frequency values in that table.

After creating a frequency table we can have numeric data which we can apply statistical methods on. Chi-Square Goodness of Fit test and Test of Independence are mostly known methods to check whether frequencies are as expected or not according to observed values. These tests provide information about the distribution of nominal data. Besides, there is also one more metric that provides an overall score about the association of two nominal data. It is called the Odds ratio. The odds ratio mostly works on nominal variables that have exactly two levels. The statistical test called Fisher’s Exact for 2×2 tables tests whether the odds ratio is equal to 1 or not. It can also test whether the odds ratio is greater or less than 1.

In this article, I will explain what the odds ratio is, how to calculate it, and how to test whether it is going to be equal to 1 in population. We will see the following sections;

Example Case
Odds Ratio and Calculation
Testing Odds Ratio with Fisher’s Exact Test in Python

Example Case

Suppose that you’re working on clinical data. You have two basic variables. One of them shows whether patients are above the average weight and the other one shows whether patients have health problems. Your purpose is to find whether there is any difference or association between being above the average weight and having health problems. So let’s assume that we did our experiment on 20 different patients and their results are found like the table below.

Источник

Odds ratio in python

How to get odds-ratios and other related features with scikit-learn in Python?

Method 1: Using the StatsModels Library

Step 1: Import Required Libraries

Step 2: Load Data

Step 3: Prepare Data

Step 4: Fit Logistic Regression Model

Step 5: Get Odds-Ratios and Other Related Features

Step 6: Print Results

Method 2: Using the ResearchPy Library

Step 1: Install ResearchPy Library

Step 2: Import Required Libraries

Step 3: Load Data

Step 4: Create Crosstab and Perform Chi-Square Test

Step 5: Obtain Odds Ratio

Step 6: Obtain Other Related Features

Method 3: Using the Pandas Crosstab Function

scipy.stats.contingency.odds_ratio#

Interpretation of Odds Ratio and Fisher’s Exact Test

Calculation of the Odds Ratio and Applying Fisher’s Test on Python

Example Case