简体   繁体   中英

What is the best way to iterate through a data frame in Python?

I trying to build a data frame based on another one. In order to build the second one, I need to loop over the first data frame and make some changes to the data and insert it in the second one. I am using a namedTuple for my for loop.

This loop is taking a lot of time to process 2m rows of data. Is there any fastest way to do this?

Since usually pandas dataframe were built on columns, it seems that it cannot provide a way to iterate through lines. However, This is the way I use for processing each row from the pandas dataframe:

rows = zip(*(table.loc[:, each] for each in table))
for rowNum, record in enumerate(rows):
    # If you want to process record, modify the code to process here:
    # Otherwise can just print each row
    print("Row", rowNum, "records: ", record)

Btw, I still suggest you to look for some pandas methods that can help you process your first dataframe - usually will be quicker and more effective than you write your own. Wish this could help.

I'd recommend using the iterrows function that is built into pandas.

data = {'Name': ['John', 'Paul', 'George'], 'Age': [20, 21, 19]}
  db = pd.DataFrame(data)
  print(f"Dataframe:\n{db}\n")
    for row, col in db.iterrows():
      print(f"Row Index:{row}")
      print(f"Column:\n{col}\n")

The output of the above:

Dataframe:
     Name  Age
0    John   20
1    Paul   21
2  George   19

Row Index:0
Column:
Name    John
Age       20
Name: 0, dtype: object

Row Index:1
Column:
Name    Paul
Age       21
Name: 1, dtype: object

Row Index:2
Column:
Name    George
Age         19
Name: 2, dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM