Python dataframe get column by index

Содержание

Call column in dataframe by column index instead of column name — pandas
2 Answers 2
Access rows and columns by integer position(s)
Access rows and columns by label(s)
Как выбрать столбцы по индексу в Pandas DataFrame
Пример 1: выбор столбцов на основе целочисленного индексирования
Пример 2. Выбор столбцов на основе индексации меток
Дополнительные ресурсы
Pandas: Selecting column from data frame
4 Answers 4
Selecting pandas column by location
7 Answers 7

Call column in dataframe by column index instead of column name — pandas

How can I call column in my code using its index in dataframe instead of its name. For example I have dataframe df with columns a , b , c Instead of calling df[‘a’] , can I call it using its column index like df[1] ?

2 Answers 2

>>> df a b c 0 1 4 7 1 2 5 8 2 3 6 9 >>> df['a'] 0 1 1 2 2 3 Name: a, dtype: int64 >>> df.iloc[:, 0] 0 1 1 2 2 3 Name: a, dtype: int64

The indexing and selecting data documentation mentions that the indexing operator [] is provided more for convenience. The iloc and loc methods provide more explicit indexing operations on the dataframe.

Note: Index has its own connotation in pandas. So when referring to the numeric index (like an array index), it is better to use interger position (or just position).

>>> df a b 0 1 4 1 2 5 2 3 6 >>> df['a'] 0 1 1 2 2 3 Name: a, dtype: int64

Access rows and columns by integer position(s)

df.iloc[ row_start_position : row_end_position , col_start_position : col_end_position ]

>>> df.iloc[0:3, 0:1] a 0 1 1 2 2 3 >>> df.iloc[:, 0] # use of implicit start and end 0 1 1 2 2 3 Name: a, dtype: int64

Access rows and columns by label(s)

df.loc[ row_start_label : row_end_label , col_start_label : col_end_label ]

Note: In this example, it just so happens that the row label(s) and the row position(s) are are the same, which are integers 0, 1, 2 .

>>> df.loc[0:2, 'a':'a'] a 0 1 1 2 2 3 >>> df.loc[:, 'a'] # use of implicit start and end 0 1 1 2 2 3 Name: a, dtype: int64

See how to Query / Select / Slice Data for more details.

Источник

Как выбрать столбцы по индексу в Pandas DataFrame

Часто вам может понадобиться выбрать столбцы кадра данных pandas на основе их значения индекса.

Если вы хотите выбрать столбцы на основе целочисленного индексирования, вы можете использовать функцию .iloc .

Если вы хотите выбрать столбцы на основе индексации меток, вы можете использовать функцию .loc .

В этом руководстве представлен пример использования каждой из этих функций на практике.

Пример 1: выбор столбцов на основе целочисленного индексирования

В следующем коде показано, как создать кадр данных pandas и использовать .iloc для выбора столбца с целочисленным значением индекса 3 :

import pandas as pd #create DataFrame df = pd.DataFrame() #view DataFrame df team points assists rebounds 0 A 11 5 11 1 A 7 7 8 2 A 8 7 10 3 B 10 9 6 4 B 13 12 6 5 B 13 9 5 #select column with index position 3 df.iloc [:, 3] 0 11 1 8 2 10 3 6 4 6 5 5 Name: rebounds, dtype: int64

Мы можем использовать аналогичный синтаксис для выбора нескольких столбцов:

#select columns with index positions 1 and 3 df.iloc [:, [1, 3]] points rebounds 0 11 11 1 7 8 2 8 10 3 10 6 4 13 6 5 13 5

Или мы могли бы выбрать все столбцы в диапазоне:

#select columns with index positions in range 0 through 3 df.iloc [:, 0:3] team points assists 0 A 11 5 1 A 7 7 2 A 8 7 3 B 10 9 4 B 13 12 5 B 13 9

Пример 2. Выбор столбцов на основе индексации меток

В следующем коде показано, как создать кадр данных pandas и использовать .loc для выбора столбца с меткой индекса «rebounds» :

import pandas as pd #create DataFrame df = pd.DataFrame() #view DataFrame df team points assists rebounds 0 A 11 5 11 1 A 7 7 8 2 A 8 7 10 3 B 10 9 6 4 B 13 12 6 5 B 13 9 5 #select column with index label 'rebounds' df.loc[:, 'rebounds'] 0 11 1 8 2 10 3 6 4 6 5 5 Name: rebounds, dtype: int64

Мы можем использовать аналогичный синтаксис для выбора нескольких столбцов с разными метками индекса:

#select the columns with index labels 'points' and 'rebounds' df.loc[:, ['points', 'rebounds']] points rebounds 0 11 11 1 7 8 2 8 10 3 10 6 4 13 6 5 13 5

Или мы могли бы выбрать все столбцы в диапазоне:

#select columns with index labels between 'team' and 'assists' df.loc[:, 'team ':' assists'] team points assists 0 A 11 5 1 A 7 7 2 A 8 7 3 B 10 9 4 B 13 12 5 B 13 9

Дополнительные ресурсы

В следующих руководствах объясняется, как выполнять другие распространенные операции в pandas:

Источник

Pandas: Selecting column from data frame

Pandas beginner here. I’m looking to return a full column’s data and I’ve seen a couple of different methods for this. What is the difference between the two entries below, if any? It looks like they return the same thing. loansData[‘int_rate’] loansData.int_rate

4 Answers 4

The latter is basically syntactic sugar for the former. There are (at least) a couple of gotchas:

If the name of the column is not a valid Python identifier (e.g., if the column name is my column name?! , you must use the former.
Somewhat surprisingly, you can only use the former form to completely correctly add a new column (see, e.g., here).

Example for latter statement:

import pandas as pd df = pd.DataFrame() df.b = range(4) >> df.columns Index([u'a'], dtype='object')

For some reason, though, df.b returns the correct results.

@EdChum I remembered there was a problem with it — see above. I’ve had loads of trouble with this in the past.

They do return the same thing. The column names in pandas are akin to dictionary keys that refer to a series. The column names themselves are named attributes that are part of the dataframe object.

The first method is preferred as it allows for spaces and other illegal operators.

For a more complete explanation, I recommend you take a look at this article: http://byumcl.bitbucket.org/bootcamp2013/labs/pd_types.html#pandas-types

Search ‘Access using dict notation’ to find the examples where they show that these two methods return identical values.

They’re the same but for me the first method handles spaces in column names and illegal characters so is preferred, example:

In [115]: df = pd.DataFrame(columns=['a', ' a', '1a']) df Out[115]: Empty DataFrame Columns: [a, a, 1a] Index: [] In [116]: print(df.a) # works print([' a']) # works print(df.1a) # error File "", line 3 print(df.1a) ^ SyntaxError: invalid syntax

Really when you use dot . it’s trying to find a key as an attribute, if for some reason you have used column names that match an attribute then using dot will not do what you expect.

In [121]: df = pd.DataFrame(columns=['index'], data = np.random.randn(3)) df Out[121]: index 0 0.062698 1 -1.066654 2 -1.560549 In [122]: df.index Out[122]: Int64Index([0, 1, 2], dtype='int64')

The above has now shown the index as opposed to the column ‘index’

Источник

Selecting pandas column by location

I’m simply trying to access named pandas columns by an integer. You can select a row by location using df.ix[3] . But how to select a column by integer? My dataframe:

In this example, the ordering of the columns may not by defined. (‘a’ may be the first or the second column).

7 Answers 7

Two approaches that come to mind:

>>> df A B C D 0 0.424634 1.716633 0.282734 2.086944 1 -1.325816 2.056277 2.583704 -0.776403 2 1.457809 -0.407279 -1.560583 -1.316246 3 -0.757134 -1.321025 1.325853 -2.513373 4 1.366180 -1.265185 -2.184617 0.881514 >>> df.iloc[:, 2] 0 0.282734 1 2.583704 2 -1.560583 3 1.325853 4 -2.184617 Name: C >>> df[df.columns[2]] 0 0.282734 1 2.583704 2 -1.560583 3 1.325853 4 -2.184617 Name: C

Edit: The original answer suggested the use of df.ix[:,2] but this function is now deprecated. Users should switch to df.iloc[:,2] .

Note that if you have two columns with the same name df.iloc[:,2] method works, returning just one column but df[df.columns[2]] method will return both columns with the same name.

As BobbyG directly above correctly states in case of duplicate column names df.columns[2] then df[df.columns[2]] will return all columns of that name and is a dataframe, not a series object.

You can also use df.icol(n) to access a column by integer.

Update: icol is deprecated and the same functionality can be achieved by:

df.iloc[:, n] # to access the column at the nth position

Note that for the upcoming version 0.11.0, these methods are deprecated and may be removed in future versions. See pandas.pydata.org/pandas-docs/dev/… on how to select by position using iloc/iat.

The above link is deprecated because the indexing docs have since been restructured: pandas.pydata.org/pandas-docs/stable/…. As of today, in which the most recent version is 0.21.0, iloc remains the documented approach to accessing a column by position.

You could use label based using .loc or index based using .iloc method to do column-slicing including column ranges:

In [50]: import pandas as pd In [51]: import numpy as np In [52]: df = pd.DataFrame(np.random.rand(4,4), columns = list('abcd')) In [53]: df Out[53]: a b c d 0 0.806811 0.187630 0.978159 0.317261 1 0.738792 0.862661 0.580592 0.010177 2 0.224633 0.342579 0.214512 0.375147 3 0.875262 0.151867 0.071244 0.893735 In [54]: df.loc[:, ["a", "b", "d"]] ### Selective columns based slicing Out[54]: a b d 0 0.806811 0.187630 0.317261 1 0.738792 0.862661 0.010177 2 0.224633 0.342579 0.375147 3 0.875262 0.151867 0.893735 In [55]: df.loc[:, "a":"c"] ### Selective label based column ranges slicing Out[55]: a b c 0 0.806811 0.187630 0.978159 1 0.738792 0.862661 0.580592 2 0.224633 0.342579 0.214512 3 0.875262 0.151867 0.071244 In [56]: df.iloc[:, 0:3] ### Selective index based column ranges slicing Out[56]: a b c 0 0.806811 0.187630 0.978159 1 0.738792 0.862661 0.580592 2 0.224633 0.342579 0.214512 3 0.875262 0.151867 0.071244

Источник