Python pandas select by index

Содержание

Pandas Select Rows by Index (Position/Label)
1. Quick Examples of Select Rows by Index Position & Labels
2. Select Rows by Index using Pandas iloc[]
2.1 Select Row by Integer Index
2.2. Get Multiple Rows by Index List
2.3. Get DataFrame Rows by Index Range
3. Select Rows by Index Labels using Pandas loc[]
3.1. Get Row by Label
3.2. Get Multiple Rows by Label List
3.3. Get Rows Between Two Labels
4. Complete Example
5. Conclusion
Related Articles
References
You may also like reading:
Как выбрать строки по индексу в Pandas DataFrame
Пример 1: выбор строк на основе целочисленного индексирования
Пример 2. Выбор строк на основе индексации меток
Разница между .iloc и .loc

Pandas Select Rows by Index (Position/Label)

Use pandas DataFrame.iloc[] & DataFrame.loc[] to select rows by integer Index and by row indices respectively. iloc[] operator can accept single index, multiple indexes from the list, indexes by a range, and many more. loc[] operator is explicitly used with labels that can accept single index labels, multiple index labels from the list, indexes by a range (between two indexes labels), and many more. When using .iloc[] or loc[] with an index that doesn’t exist it returns an error.

In this article, I will explain how to select rows from pandas DataFrame by integer index and label (single & multiple rows), by the range, and by selecting first and last n rows with several examples. loc[] & iloc[] operators are also used to select columns from pandas DataFrame and refer to related article how to get cell value from pandas DataFrame.

Читайте также: Setup java in linux

1. Quick Examples of Select Rows by Index Position & Labels

If you are in a hurry, below are some quick examples of how to select a row of pandas DataFrame by index.

Let’s create a DataFrame with a few rows and columns and execute some examples to learn using an index. Our DataFrame contains column names Courses , Fee , Duration , and Discount .

 index_labels=['r1','r2','r3','r4','r5','r6','r7'] df = pd.DataFrame(technologies,index=index_labels) print(df)

2. Select Rows by Index using Pandas iloc[]

pandas iloc[] operator is an index-based to select DataFrame rows. Remember index starts from 0. You can use pandas.DataFrame.iloc[] with the syntax [start:stop:step] ; where start indicates the index of the first row to start, stop indicates the index of the last row to stop, and step indicates the number of indices to advance after each extraction. Or, use the syntax: [[indices]] with indices as a list of row indices to take.

2.1 Select Row by Integer Index

You can select a single row from pandas DataFrame by integer index using df.iloc[n] . Replace n with a position you wanted to select.

2.2. Get Multiple Rows by Index List

Sometimes you may need to get multiple rows from DataFrame by specifies indexes as a list. Certainly, you can do this. For example df.iloc[[2,3,6]] selects rows 3, 4 and 7 as index starts from zero.

2.3. Get DataFrame Rows by Index Range

When you wanted to select a DataFrame by the range of Indexes, provide start and stop indexes.

By not providing a start index, iloc[] selects from the first row.
By not providing stop, iloc[] selects all rows from the start index.
Providing both start and stop, selects all rows in between.

3. Select Rows by Index Labels using Pandas loc[]

By using pandas.DataFrame.loc[] you can get rows by index names or labels. To select the rows, the syntax is df.loc[start:stop:step] ; where start is the name of the first-row label to take, stop is the name of the last row label to take, and step as the number of indices to advance after each extraction; for example, you can use it to select alternate rows. Or, use the syntax: [[labels]] with labels as a list of row labels to take.

3.1. Get Row by Label

If you have custom index labels on DataFrame, you can use these label names to select row. For example df.loc[‘r2’] returns row with label ‘r2’.

3.2. Get Multiple Rows by Label List

If you have a list of row labels, you can use this to select multiple rows from pandas DataFrame.

3.3. Get Rows Between Two Labels

You can also select rows between two index labels.

You can get the first two rows using df.loc[:’r2′] , but this approach is not much used as you need to know the row labels hence, to select the first n rows it is recommended to use by index df.iloc[:n] , replace n with the value you want. The same applies to get the last n rows.

4. Complete Example

 index_labels=['r1','r2','r3','r4','r5','r6','r7'] df = pd.DataFrame(technologies,index=index_labels) print(df) # Select Row by Index print(df.iloc[2]) # Select Rows by Index List print(df.iloc[[2,3,6]]) # Select Rows by Integer Index Range print(df.iloc[1:5]) # Select First Row print(df.iloc[:1]) # Select First 3 Rows print(df.iloc[:3]) # Select Last Row print(df.iloc[-1:]) # Select Last 3 Row print(df.iloc[-3:]) # Selects alternate rows print(df.iloc[::2]) # Select Row by Index Label print(df.loc['r2']) # Select Rows by Index Label List print(df.loc[['r2','r3','r6']]) # Select Rows by Label Index Range print(df.loc['r1':'r5']) # Select Rows by Label Index Range print(df.loc['r1':'r5']) # Select Alternate Rows with in Index Labels print(df.loc['r1':'r5':2])

5. Conclusion

In this article, you have learned how to select a single row or multiple rows from pandas DataFrame by integer index and labels by using iloc[] and loc[] respectively. Using these you can also select rows by ranges, select first and last n rows e.t.c.

References

Как выбрать строки по индексу в Pandas DataFrame

Часто вам может понадобиться выбрать строки кадра данных pandas на основе их значения индекса.

Если вы хотите выбрать строки на основе целочисленного индексирования, вы можете использовать функцию .iloc .

Если вы хотите выбрать строки на основе индексации меток, вы можете использовать функцию .loc .

В этом руководстве представлен пример использования каждой из этих функций на практике.

Пример 1: выбор строк на основе целочисленного индексирования

В следующем коде показано, как создать кадр данных pandas и использовать .iloc для выбора строки с целочисленным значением индекса 4 :

import pandas as pd import numpy as np #make this example reproducible np.random.seed (0) #create DataFrame df = pd.DataFrame(np.random.rand (6,2), index=range(0,18,3), columns=['A', 'B']) #view DataFrame df A B 0 0.548814 0.715189 3 0.602763 0.544883 6 0.423655 0.645894 9 0.437587 0.891773 12 0.963663 0.383442 15 0.791725 0.528895 #select the 5th row of the DataFrame df.iloc [[4]] A B 12 0.963663 0.383442

Мы можем использовать аналогичный синтаксис для выбора нескольких строк:

#select the 3rd, 4th, and 5th rows of the DataFrame df.iloc [[2, 3, 4]] A B 6 0.423655 0.645894 9 0.437587 0.891773 12 0.963663 0.383442

Или мы могли бы выбрать все строки в диапазоне:

#select the 3rd, 4th, and 5th rows of the DataFrame df.iloc [2:5] A B 6 0.423655 0.645894 9 0.437587 0.891773 12 0.963663 0.383442

Пример 2. Выбор строк на основе индексации меток

В следующем коде показано, как создать кадр данных pandas и использовать .loc для выбора строки с меткой индекса 3 :

import pandas as pd import numpy as np #make this example reproducible np.random.seed (0) #create DataFrame df = pd.DataFrame(np.random.rand (6,2), index=range(0,18,3), columns=['A', 'B']) #view DataFrame df A B 0 0.548814 0.715189 3 0.602763 0.544883 6 0.423655 0.645894 9 0.437587 0.891773 12 0.963663 0.383442 15 0.791725 0.528895 #select the row with index label '3' df.loc[[3]] A B 3 0.602763 0.544883

Мы можем использовать аналогичный синтаксис для выбора нескольких строк с разными метками индекса:

#select the rows with index labels '3', '6', and '9' df.loc[[3, 6, 9]] A B 3 0.602763 0.544883 6 0.423655 0.645894 9 0.437587 0.891773

Разница между .iloc и .loc

Приведенные выше примеры иллюстрируют тонкую разницу между .iloc и .loc :

.iloc выбирает строки на основе целочисленного индекса.Итак, если вы хотите выбрать 5-ю строку в DataFrame, вы должны использовать df.iloc[[4]], так как первая строка имеет индекс 0, вторая строка имеет индекс 1 и так далее.
.loc выбирает строки на основе помеченного индекса.Итак, если вы хотите выбрать строку с меткой индекса 5, вы должны напрямую использовать df.loc[[5]].

Источник