- KeyError: (‘var1’, ‘occurred at index 16’)
- Saved searches
- Use saved searches to filter your results more quickly
- frame _apply_standard error when operating on 0 or NaN values #6246
- frame _apply_standard error when operating on 0 or NaN values #6246
- Comments
- KeyError: (‘Year’, ‘occurred at index Year’)
- Pandas — ‘Series’ object has no attribute
- 2 Answers 2
- Pandas .shift function yields a «‘float’ object has no attribute ‘shift'», ‘occurred at index 0’)
- 2 Answers 2
KeyError: (‘var1’, ‘occurred at index 16’)
I’m getting this weird error msg: KeyError: (‘var1’, ‘occurred at index 16’) on th following line of code:
df['var1'] = df.apply(lambda row: (row['var1']*row['var2']), axis = 1)
I’m multiplying 2 columns of the df DataFrame. And 16 in df.index is True . I can access the 16th row normally. If i drop that row, the error persists. Any thoughts on this? EDIT: As requested, a sample of the data:
X var1 Y \ 0 US4642867729 22.3052 Korea; Republic (S. Korea) 1 US4642867729 5.9139 Korea; Republic (S. Korea) 2 US4642867729 3.0799 Korea; Republic (S. Korea) 3 US4642867729 2.9647 Korea; Republic (S. Korea) 4 US4642867729 2.5798 Korea; Republic (S. Korea) 5 US4642867729 2.5281 Korea; Republic (S. Korea) 6 US4642867729 2.3359 Korea; Republic (S. Korea) 7 US4642867729 2.2434 Korea; Republic (S. Korea) 8 US4642867729 1.8624 Korea; Republic (S. Korea) W Z \ 0 Information Technology US4642867729 1 Information Technology US4642867729 2 Materials US4642867729 3 Health Care US4642867729 4 Information Technology US4642867729 5 Financials US4642867729 6 Consumer Discretionary US4642867729 7 Financials US4642867729 8 Materials US4642867729 var2 0 0.16258420849834973043179786600376246497035026. 1 0.16258420849834973043179786600376246497035026. 2 0.16258420849834973043179786600376246497035026. 3 0.16258420849834973043179786600376246497035026. 4 0.16258420849834973043179786600376246497035026. 5 0.16258420849834973043179786600376246497035026. 6 0.16258420849834973043179786600376246497035026. 7 0.16258420849834973043179786600376246497035026. 8 0.16258420849834973043179786600376246497035026.
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
frame _apply_standard error when operating on 0 or NaN values #6246
frame _apply_standard error when operating on 0 or NaN values #6246
Comments
Here are the steps I followed. I am using pandas (0.12.0) .
In [1]: import pandas as pd In [4]: dataFrame = pd.read_csv('./test.csv') In [7]: dataFrame Out[7]: r1 r2 r3 r4 r5 0 NaN 3.5 NaN NaN 5 1 4.5 NaN 4 NaN NaN 2 1.5 NaN NaN NaN NaN 3 NaN NaN NaN NaN NaN 4 NaN NaN NaN NaN NaN 5 4.5 NaN 4 NaN NaN 6 NaN NaN NaN NaN NaN In [8]: dataFrame['mean'] = dataFrame.mean(axis=1) In [9]: dataFrame Out[9]: r1 r2 r3 r4 r5 mean 0 NaN 3.5 NaN NaN 5 4.25 1 4.5 NaN 4 NaN NaN 4.25 2 1.5 NaN NaN NaN NaN 1.50 3 NaN NaN NaN NaN NaN NaN 4 NaN NaN NaN NaN NaN NaN 5 4.5 NaN 4 NaN NaN 4.25 6 NaN NaN NaN NaN NaN NaN In [10]: dataFrame.dtypes Out[10]: r1 float64 r2 float64 r3 float64 r4 float64 r5 float64 mean float64 dtype: object In [11]: meanCenteredDataFrame = dataFrame.apply(lambda x: x -x['mean']) --------------------------------------------------------------------------- KeyError Traceback (most recent call last) in () ----> 1 meanCenteredDataFrame = dataFrame.apply(lambda x: x -x['mean']) //anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in apply(self, func, axis, broadcast, raw, args, **kwds) 4414 return self._apply_raw(f, axis) 4415 else: -> 4416 return self._apply_standard(f, axis) 4417 else: 4418 return self._apply_broadcast(f, axis) //anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in _apply_standard(self, func, axis, ignore_failures) 4489 # no k defined yet 4490 pass -> 4491 raise e 4492 4493 KeyError: ('mean', u'occurred at index r1') In [12]: dataFrame.fillna(0,inplace=True) In [13]: meanCenteredDataFrame = dataFrame.apply(lambda x: x -x['mean']) --------------------------------------------------------------------------- KeyError Traceback (most recent call last) in () ----> 1 meanCenteredDataFrame = dataFrame.apply(lambda x: x -x['mean']) Type: DataFrame String Form: r1 r2 r3 r4 r5 mean 0 0.0 3.5 0 0 5 4.25 //anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in apply(self, func, axis, broadcast, raw, args, **kwds) 4414 return self._apply_raw(f, axis) 4415 else: -> 4416 return self._apply_standard(f, axis) 4417 else: 4418 return self._apply_broadcast(f, axis) //anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in _apply_standard(self, func, axis, ignore_failures) 4489 # no k defined yet 4490 pass -> 4491 raise e 4492 4493 KeyError: ('mean', u'occurred at index r1')
The text was updated successfully, but these errors were encountered:
KeyError: (‘Year’, ‘occurred at index Year’)
I want to combine the Year, Month, and Day in the newly defined column Date. I used this link to achieve my goal. My data frame, named z has dataframe as below:
Year Month day Hour Minute Second Latitude Longirude Exact 0 1992 12 31 23 59 59 29.456137 85.506958 0 1 2017 10 1 4 35 38 27.694225 85.291702 0 2 2017 10 1 6 13 18 28.962729 80.912323 0 3 2017 10 2 5 18 31 27.699097 85.299431 0 4 2017 10 3 4 23 20 27.700438 85.329933 0
z['Date'] = z.apply(lambda row: datetime(int(row['Year']), int(row['Month']), int(row['day']), axis=1))
Traceback (most recent call last): File "", line 1, in z['Date'] = z.apply(lambda row: datetime(int(row['Year']), int(row['Month']), int(row['day']), axis=1)) File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 3972, in apply return self._apply_standard(f, axis, reduce=reduce) File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 4064, in _apply_standard results[i] = func(v) File "", line 1, in z['Date'] = z.apply(lambda row: datetime(int(row['Year']), int(row['Month']), int(row['day']), axis=1)) File "/usr/lib/python3/dist-packages/pandas/core/series.py", line 557, in __getitem__ result = self.index.get_value(self, key) File "/usr/lib/python3/dist-packages/pandas/core/index.py", line 1790, in get_value return self._engine.get_value(s, k) File "pandas/index.pyx", line 103, in pandas.index.IndexEngine.get_value (pandas/index.c:3204) File "pandas/index.pyx", line 111, in pandas.index.IndexEngine.get_value (pandas/index.c:2903) File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:3908) KeyError: ('Year', 'occurred at index Year')
I also checked what kind of error it is through enter link description here. But I didn’t find any colum missing or whitespace wrong.
Pandas — ‘Series’ object has no attribute
I really need to use the second case (access the colNames using the list) which gives an error, any clues on how to do this?
2 Answers 2
When you use df.apply() , each row of your DataFrame will be passed to your lambda function as a pandas Series. The frame’s columns will then be the index of the series and you can access values using series[label] .
df['D'] = (df.apply(lambda x: myfunc(x[colNames[0]], x[colNames[1]]), axis=1))
In general, this error occurs if you try to access an attribute that doesn’t exist on an object. For pandas Serieses (or DataFrames), it occurs because you tried to index it using the attribute access ( . ).
In the case in the OP, they used x.colNames[0] to access the value on colNames[0] in row x but df doesn’t have attribute colNames , so the error occurred. 1
Another case this error may occur is if an index had a white space in it that you didn’t know about. For example, the following case reproduces this error.
s = pd.Series([1, 2], index=[' a', 'b']) s.a
In this case, make sure to remove the white space:
s.index = [x.strip() for x in s.index] # or s.index = [x.replace(' ', '') for x in s.index]
Finally, it’s always safe to use [] to index a Series (or a DataFrame).
1: Serieses have the following attributes: axes , dtypes , empty , index , ndim , size , shape , T , values . DataFrames have all of these attributes + columns . When you use df.apply(. axis=1) , it iterates over the rows where each row is a Series whose indices are the column names of df .
Pandas .shift function yields a «‘float’ object has no attribute ‘shift'», ‘occurred at index 0’)
I am trying to create a new column in a pandas dataframe using a very complex if statement (I have simplified it for the sake of clarity below). I keep getting the error: («‘float’ object has no attribute ‘shift'», ‘occurred at index 0’). I have looked around stack/the internet and have not come up with a great answer for my solution. Some answers involve taking the .shift out of a function, however, I need to have it within a function due to the complex nature of the if statement I am writing. I have attached an image below detailing what I ultimately want the function to do. I believe it explains it better than I could describe it with words. Any help or guidance would be greatly appreciated. Please let me know if you have any questions or if I can clarify anything! Code example
df=pd.read_csv(file) def ubk (df): x = df['k_calc'].shift(1) if x 90: return 2 df['test'] = df.apply(ubk,axis = 1)
2 Answers 2
You may pass additional parameter to apply if you want. In this case you may pass the main df and your ubk handles/processes it as you want. I don’t know exact purpose of your ubk , so I just modify ubk to accomplish what you describe for column test . It seems your logic is not efficient, but you may have your own reason to use it. So, it is up to you.
In [301]: df Out[301]: lowest_low k_calc d_cal 0 9.07 75.0000 NaN 1 9.07 79.7297 NaN 2 9.07 92.5675 NaN 3 9.07 66.2116 78.3772
function and call apply to create test columns with condition: if previous cell of k_calc < 90 returns 1, >90 returns 2 as follows
def ubk (s, m_df): x = m_df['k_calc'].shift(1)[s.name] if x 90: return 2 df['test'] = df.apply(ubk, axis=1, args=(df,)) Out[304]: lowest_low k_calc d_cal test 0 9.07 75.0000 NaN NaN 1 9.07 79.7297 NaN 1.0 2 9.07 92.5675 NaN 1.0 3 9.07 66.2116 78.3772 2.0