简体   繁体   中英

Indexing With Pandas

I am new to pandas so I am having trouble with the indexing when writing this loop for my assignment:

quality = wine_data_all['quality']
for i in range(1,len(quality.index)): if quality[i] == 6 | quality[i] ==5:
     quality[i] = 1;
wine_data_all.replace['quality',quality]

my intention is to switch all the values that are 6 and 5 in the quality column of wine_data_all with 1 and then swap the new replaced column in for quality. If i can do this without creating a new quality and simply editing the wine_data_all it will also work but I ran into even more problems when trying to index directly out of the data frame.

The error I am getting is:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [150], in <cell line: 2>()
      1 quality = wine_data_all['quality']
      2 for i in range(1,len(quality.index)):
----> 5     if quality[i] == 6 | quality[i] ==5:
      7         quality[i] = 0;
     11 print(quality)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\generic.py:1527, in NDFrame.__nonzero__(self)
   1525 @final
   1526 def __nonzero__(self):
-> 1527     raise ValueError(
   1528         f"The truth value of a {type(self).__name__} is ambiguous. "
   1529         "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1530     )

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Any help is appreciated.

No need to iterate over the values. Pandas has methods that can do this type of work for you.

Since this is a simple assignment just select and assign value 1.

wine_data_all['quality'][wine_data_all['quality'].isin((5, 6))] = 1

Here's an alternative which would be more suited for a complicated transformation.

wine_data_all['quality'] = wine_data_all['quality'].apply(lambda x: 1 if x in (5, 6) else x)

Since you have not provided any data, here is my test data:

df=pd.DataFrame({'A':[1,2,3,4,np.nan], 'B':[9,8,7,6,5]})

Lets replace A==2| A==3 A==2| A==3 . Since you are already using pandas it is better not to use any loops. The following line of code index all rows where the condition is met and the columns name is A . All these values are then set to 0

df.loc[(df['A']==3) | (df['A']==4), 'A']=0

Gives:

    A       B
0   1.0     9
1   2.0     8
2   0.0     7
3   0.0     6
4   NaN     5

I think you could eliminate your error by setting both conditions into brackets.

For a pandas Dataframe you can iterate through it using the method .iterrows()

So you could use:

for i, element in wine_data_all.iterrows():
     # Your process here, access the quality column as:
     # element["quality"]
     

If you want to iterate over a single column, ie a Pandas Series , you iterate only over its items:

quality = wine_data_all["quality"]
for i, item in quality.items():
      # Your process here, the variable
      # 'item' is already a numeric value as defined in DataFrame

However, as you are doing a row-wise process and each row is independent of each other, I would suggest taking a look at .apply() . Then your code can be done in a single line in a more efficient and pythonic way:

wine_data_all["quality"] = wine_data_all["quality"].apply(lambda x: 0 if (x == 5 or x == 6) else x)

PS For further reading into indexing in Pandas, take a look at the methods loc and iloc

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM