I have dataframe df1
as follows
+------+----------+-----+
| Date | Location | Key |
+------+----------+-----+
| | a | 1 |
| | a | 2 |
| | b | 3 |
| | b | 3 |
| | b | 3 |
| | c | 4 |
| | c | 4 |
| | b | 5 |
| | b | 6 |
| | d | 7 |
| | b | 8 |
| | b | 8 |
| | b | 8 |
| | b | 9 |
+------+----------+-----+
and df2
below is sliced from that.
+------+----------+-----+
| Date | Location | Key |
+------+----------+-----+
| | b | 3 |
| | b | 3 |
| | b | 3 |
| | b | 5 |
| | b | 6 |
| | b | 8 |
| | b | 8 |
| | b | 9 |
| | b | 9 |
+------+----------+-----+
The goal is to find the time difference between the Key
changes in df2
(like from the last 3 to 5, 5 to 6, 6 to the first 8, last 8 to first 9 and so on), add them up, repeat this for every Location
item and average them.
Can this process be vectorized or we need to slice the dataframe for every machine and manually compute the average?
[EDIT]:
Traceback (most recent call last):
File "<ipython-input-1142-b85a122735aa>", line 1, in <module>
s = temp.groupby('SSCM_ Location').apply(lambda x: x[x['Key'].diff().ne(0)]['Execution Date'].diff().mean())
File "C:\Users\dbhadra\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 930, in apply
return self._python_apply_general(f)
File "C:\Users\dbhadra\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 936, in _python_apply_general
self.axis)
File "C:\Users\dbhadra\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 2273, in apply
res = f(group)
File "<ipython-input-1142-b85a122735aa>", line 1, in <lambda>
s = temp.groupby('SSCM_ Location').apply(lambda x: x[x['Key'].diff().ne(0)]['Execution Date'].diff().mean())
File "C:\Users\dbhadra\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\series.py", line 1995, in diff
result = algorithms.diff(com._values_from_object(self), periods)
File "C:\Users\dbhadra\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\algorithms.py", line 1823, in diff
out_arr[res_indexer] = arr[res_indexer] - arr[lag_indexer]
TypeError: unsupported operand type(s) for -: 'str' and 'str'
You can try do with
g=df.groupby(['Location','Key'])
(g.first()-g.last().groupby('Location').shift()).mean(level=0)
s = df.groupby('Location').apply(lambda x: x[x['Key'].diff().ne(0)]['Date'].diff().mean())
Is this what you mean? It averages the time delta of date when the key value changes per location. If you meant the average of change of 'Key' just change 'Date' to 'Key'.
You can try:
# obviously we will group by Location
groups = df1.groupby('Location')
# we record the changes and mark the unchanged with nan
df1['changes'] = groups.Key.diff().replace({0:np.nan})
# average the changes by location
# ignore all the nan's (unchanges)
groups.changes.mean()
Output:
Location
a 1.0
b 1.5
c NaN
d NaN
Name: changes, dtype: float64
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.