I need some help in converting the following code to a more efficient one without using iterrows().
for index, row in df.iterrows():
alist=row['index_vec'].strip("[] ").split(",")
blist=[int(i) for i in alist]
for col in blist:
df.loc[index, str(col)] = df.loc[index, str(col)] +1
The above code basically reads a string under 'index_vec' column, parses and converts to integers, and then increments the associated columns by one for each integer. An example of the output is shown below:
Take the 0th row as an example. Its string value is "[370, 370, -1]". So the above code increments column "370" by 2 and column "-1" by 1. The output display is truncated so that only "-10" to "17" columns are shown.
The use of iterrows() is very slow to process a large dataframe. I'd like to get some help in speeding it up. Thank you.
Let us do
a=df['index_vec'].str.strip("[] ").str.split(",").explode()
s=pd.crosstab(a.index,a).reindex_like(df).fillna(0)
df=df.add(a)
You can also use apply
and set axis = 1
to go row wise. Then create a custom function pass into apply
:
Example starting df:
index_vec 1201 370 -1
0 [370, -1, -1] 0 0 1
1 [1201, 1201] 0 1 1
import pandas as pd
df = pd.DataFrame({'index_vec': ["[370, -1, -1]", "[1201, 1201]"], '1201': [0, 0], '370': [0, 1], '-1': [1, 1]})
def add_counts(x):
counts = pd.Series(x['index_vec'].strip("[]").split(", ")).value_counts()
x[counts.index] = x[counts.index] + counts
return x
df.apply(add_counts, axis = 1)
print(df)
Outputs:
index_vec 1201 370 -1
0 [370, -1, -1] 0 1 3
1 [1201, 1201] 2 1 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.