I have two excel files which I read by pandas. I am comparing the index in file 1 with the index in file 2 (not the same length (ex: 10,100) and if they match, the row[index] in the second file will be zeros and else will not change. I am using for and if loops for this, but the more data I want to process(1e3,5e3), the run time becomes longer. So, is there a better way to perform such a comparison?. Here's an example of what I am using.
df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
index=[4, 5, 6], columns=['A', 'B', 'C'])
df1 = pd.DataFrame([['w'], ['y' ], ['z']],
index=[4, 5, 1])
for j in df1.index:
for i in df.index:
if i == j:
df.loc[i, :] = 0
else:
df.loc[i, :] = df.loc[i, :]
print(df)
Here loops are not necessary, you can set values to 0
per rows by DataFrame.mask
with Series.isin
(necessary convert index
to Series
for avoid ValueError: Array conditional must be same shape as self
):
df = df.mask(df.index.to_series().isin(df1.index), 0)
Or with Index.isin
and numpy.where
if want improve performance:
arr = np.where(df.index.isin(df1.index)[:, None], 0, df)
df = pd.DataFrame(arr, index=df.index, columns=df.columns)
print(df)
A B C
4 0 0 0
5 0 0 0
6 10 20 30
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.