简体   繁体   中英

How to do an operation on a subpart of a pandas dataframe?

I have a short table such as :

In [36]: df = pd.DataFrame({k: pd.np.random.random(4) for k in "ms"}, index=["A", "AH", "B", "BH"])

In [37]: df
Out[37]: 
           m         s
A   0.266581  0.386802
AH  0.626642  0.657029
B   0.643837  0.629465
BH  0.297297  0.766521

In column m, and only m, I want to subtract the two first lines from A and the last two from B. Something like: A - A, A - AH, B - B, B - BH

For example I can do :

In [38]: df.loc[["A", "AH"]]["m"] - df.loc["A"]["m"]
Out[38]: 
A     0.000000
AH    0.360061
Name: m, dtype: float64

But if I try to do that on the table I did not succeed :

In [39]: df2 = df.copy()

In [44]: df2.loc[["A", "AH"]]["m"] = df.loc[["A", "AH"]]["m"] - df.loc["A"]["m"]

In [45]: df2
Out[45]: 
           m         s
A   0.266581  0.386802
AH  0.626642  0.657029
B   0.643837  0.629465
BH  0.297297  0.766521

I do not understand why nothing was done ?

I think you need remove [] for DataFrame.loc instead Series.loc :

print (df.loc[["A", "AH"], "m"] - df.loc["A", "m"])
A     0.000000
AH   -0.696391
Name: m, dtype: float64

df.loc[["A", "AH"], "m"] = df.loc[["A", "AH"], "m"] - df.loc["A", "m"]
df.loc[["B", "BH"], "m"] = df.loc[["B", "BH"], "m"] - df.loc["B", "m"]
print (df)
           m         s
A   0.000000  0.992226
AH -0.696391  0.465135
B   0.000000  0.611135
BH  0.448778  0.569463

Why your code does not work:

Reason is called chained indexing .

If want some simply rule from tutorial modern pandas in first intro by Tom Augspurger :

The rough rule is any time you see back-to-back square brackets, ][ , you're in asking for trouble. Replace that with a .loc[..., ...] and you'll be set.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM