I have seen few questions like these
Vectorized alternative to iterrows , Faster alternative to iterrows , Pandas: Alternative to iterrow loops , for loop using iterrows in pandas , python: using .iterrows() to create columns , Iterrows performance . But it seems like everyone is a unique case rather a generalized approach.
My questions is also again about .iterrows
.
I am trying to pass the first and second row to a function and create a list out of it.
What I have:
I have a pandas DataFrame with two columns that look like this.
I.D Score
1 11 26
3 12 26
5 13 26
6 14 25
What I did:
where the term Point
is a function I earlier defined.
my_points = [Points(int(row[0]),row[1]) for index, row in score.iterrows()]
What I am trying to do:
The faster and vectorized form of the above.
The question is actually not about how you iter through a DataFrame
and return a list, but rather how you can apply a function on values in a DataFrame
by column.
You can use pandas.DataFrame.apply
with axis
set to 1
:
df.apply(func, axis=1)
To put in a list, it depends what your function returns but you could:
df.apply(Points, axis=1).tolist()
If you want to apply on only some columns:
df[['Score', 'I.D']].apply(Points, axis=1)
If you want to apply on a func
that takes multiple args
use numpy.vectorize
for speed:
np.vectorize(Points)(df['Score'], df['I.D'])
Or a lambda
:
df.apply(lambda x: Points(x['Score'], x['I.D']), axis=1).tolist()
Try list comprehension:
score = pd.concat([score] * 1000, ignore_index=True)
def Points(a,b):
return (a,b)
In [147]: %timeit [Points(int(a),b) for a, b in zip(score['I.D'],score['Score'])]
1.3 ms ± 132 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [148]: %timeit [Points(int(row[0]),row[1]) for index, row in score.iterrows()]
259 ms ± 5.42 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [149]: %timeit [Points(int(row[0]),row[1]) for row in score.itertuples()]
3.64 ms ± 80.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Have you ever tried the method .itertuples()
?
my_points = [Points(int(row[0]),row[1]) for row in score.itertuples()]
Is a faster way to iterate over a pandas dataframe.
I hope it help.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.