I have a pandas df
with a series in Distanz
, and a series in Zielcode
. I need to divide the Distanz by the number of intervals that repeat themselves. So the first interval different to zero would be divided by one
, the second one by three
and the third one by two
.
Distanz Zielcode
0.0 0
0.0 0
1.1 2
0.0 0
8.0 7
8.0 7
8.0 7
0.0 0
3.4 1
3.4 1
0.0 0
How to count the number of intervals within the whole series that meet this condition of repeat and the divide the value distanz by this count?
The desired output should look like this:
Distanz Zielcode Distanz - Output
0.0 0 0.0
0.0 0 0.0
1.1 2 1.1
0.0 0 0.0
8.0 7 2.7
8.0 7 2.7
8.0 7 2.7
0.0 0 0.0
3.4 1 1.7
3.4 1 1.7
0.0 0 0.0
I would split the problem in different steps.
Identify repeating elements:
block = ((df['Distanz'].shift() != df['Distanz']) | (df['Zielcode'].shift() != df['Zielcode'])).cumsum()
This gives:
0 1 1 1 2 2 3 3 4 4 5 4 6 4 7 5 8 6 9 6 10 7 dtype: int32
Compute the size of each block:
count = df.groupby(block).apply(lambda x: x.assign(count=len(x)) )['count'].reset_index(level=0, drop=True)
This gives:
0 2 1 2 2 1 3 1 4 3 5 3 6 3 7 1 8 2 9 2 10 1 Name: count, dtype: int64
Compute the new column:
df['Distanz - Output'] = df['Distanz'] / count
And the dataframe becomes:
Distanz Zielcode Distanz - Output
0 0.0 0 0.000000
1 0.0 0 0.000000
2 1.1 2 1.100000
3 0.0 0 0.000000
4 8.0 7 2.666667
5 8.0 7 2.666667
6 8.0 7 2.666667
7 0.0 0 0.000000
8 3.4 1 1.700000
9 3.4 1 1.700000
10 0.0 0 0.000000
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.