I trying to build a data frame based on another one. In order to build the second one, I need to loop over the first data frame and make some changes to the data and insert it in the second one. I am using a namedTuple for my for loop.
This loop is taking a lot of time to process 2m rows of data. Is there any fastest way to do this?
Since usually pandas dataframe were built on columns, it seems that it cannot provide a way to iterate through lines. However, This is the way I use for processing each row from the pandas dataframe:
rows = zip(*(table.loc[:, each] for each in table))
for rowNum, record in enumerate(rows):
# If you want to process record, modify the code to process here:
# Otherwise can just print each row
print("Row", rowNum, "records: ", record)
Btw, I still suggest you to look for some pandas methods that can help you process your first dataframe - usually will be quicker and more effective than you write your own. Wish this could help.
I'd recommend using the iterrows function that is built into pandas.
data = {'Name': ['John', 'Paul', 'George'], 'Age': [20, 21, 19]}
db = pd.DataFrame(data)
print(f"Dataframe:\n{db}\n")
for row, col in db.iterrows():
print(f"Row Index:{row}")
print(f"Column:\n{col}\n")
The output of the above:
Dataframe:
Name Age
0 John 20
1 Paul 21
2 George 19
Row Index:0
Column:
Name John
Age 20
Name: 0, dtype: object
Row Index:1
Column:
Name Paul
Age 21
Name: 1, dtype: object
Row Index:2
Column:
Name George
Age 19
Name: 2, dtype: object
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.