简体   繁体   中英

using more than one row or column value in a pandas dataframe for a calculation

one of the reasons I am so comfortable withe excel is the ease in which I can pass a range of values, or an array, and use one or more of those values in a calculation.

For example say I had the array [1, 2, 1, 5, 7, 10, 6, 20, 12, 7, 4]

I might want to calculate:

  1. The number of continuous up or down sequences, for example 1-2 would be 1 up because 2 is higher than 1.

  2. In the case of 5, 7, 10 I would like to count this up sequence as 2, (5 to 7) and (7 to 10)

  3. I would also like to count the range or values which actually contributed to the sequence, and where they occured, for example even though (6 to 10) is only 1 continuous up, it is significant because of the fact that the move is 14 (20-6). In terms of when the sequence occurred, I would like to record the array index position, or a date which may be attached to the number in another column.

Using excel, I would pass this array to a function, and say if Element(0) > element(1) move to element 2, and 3 and so on, and then when I record that the value drops in the negative direction I sum those values.

Or if I was doing it in terms of cells, I could say if (active cell value) is negative, sum the two before it using .offset.

However I am not sure how to get two or more values from a row or column using pandas in the same way I would do it using offset?!

many Thanks, Josh.

Update >> Thanks guys for your update, I will add more detail

  1. Basically Each Product has a set of values, and I have multiple products, where the product is in one column, and associated with the product are multiple numbers which refer to price. At the moment my table has a layout like the below, but the same product could appear on multiple lines as it's being sold at different dates:

'

|product    | price |
| Product A |  1    |
| Product B |  2    |
| Product C |  1    |
| Product D |  5    |
| Product E |  7    |
| Product F |  10   |
| Product G |  6    |
| Product H |  20   |
| Product I |  12   |
| Product H |  7    |
| Product I |  4    | 
  1. Then as in the first point, I would like to calculate the number of continuous up and down sequences from the previous point. For the first two points from 1 to 2 would be +1 which is shown below:

    |product | price | | Product A | 0 | | Product B | 1 | | Product C | -1 | | Product D | 4 | | Product E | 2 | | Product F | 3 | | Product G | -4 | | Product H | 14 | | Product I | -8 | | Product H | -4 | | Product I | -3 |

  2. Then I would like to lay these movements out in columns, so I can sum, and see the number of times a particular product moved by an amount.

_

|product    | price | down -3 |down -2 |down -1 |up/down 0 |up 1     | 
| Product A |  1    |    0    |   0    |   0    |    0     |   0     | 
| Product B |  2    |    0    |   0    |   0    |    0     |   1     | 
| Product C |  1    |    0    |   0    |   1    |    0     |   0     | 
| Product D |  5    |    0    |   0    |   0    |    0     |   0     | 
| Product E |  7    |    0    |   0    |   0    |    0     |   0     | 
| Product F |  10   |    0    |   0    |   0    |    0     |   0     | 
| Product G |  6    |    0    |   0    |   0    |    0     |   0     | 
| Product H |  20   |    0    |   0    |   0    |    0     |   0     | 
| Product I |  12   |    0    |   0    |   0    |    0     |   0     | 
| Product H |  7    |    0    |   0    |   0    |    0     |   0     | 
| Product I |  4    |    1    |   0    |   0    |    0     |   0     |

_

  1. Next for question (2) I would like to count the number of continuous up movements or down movements in a row, and present them in the same above format |continuous 1 | continuous 2| in columns etc.

  2. Next for question (3) I would like to see the range of values, and dates that the string of continuous up movements occurred. For example products C, D and E (let's say the dates for prices of products C, D and E were 2014-01-01,2014-01-02, and 2014-01-03. These products showed 3 up movements in a row, and the values were 1, 5, 7. So I would like to show.

_

|products |dates                    | values  |
|C, D, E  | 2014-01-01 to 2014-01-03| 1, 5, 7 |

In Summary

  • I'd like to see the change in prices for products (where the same product can be sold on different days at different prices) from one day to the next.
  • Then I would like to see the number of times a product moved 1 point, as compared to the number of times the same product moved 20 points. I may then see a pattern such that: the product doesn't change in price often, but when it does it jumps a large amount. Or I could see the by summing the changes in price, that a products price alternates most often dropping 3 points and then raising 3 points, and so it is cyclical.
  • Lastly I would like to see when the prices moved 3 points (the dates) and what values the prices were when the movement occurred, (1, 5, 7).

In [1]: s = pd.Series([1, 2, 1, 5, 7, 10, 6, 20, 12, 7, 4])

1: Number of increases

In [3]: s.diff() > 0
Out[3]: 
0     False
1      True
2     False
3      True
4      True
5      True
6     False
7      True
8     False
9     False
10    False
dtype: bool

In [4]: (s.diff() > 0).sum()
Out[4]: 5

2: Number of Decreases:

In [5]: (s.diff() < 0).sum()
Out[5]: 5

3a: Size of the Changes:

In [6]: s.diff()
Out[6]: 
0    NaN
1      1
2     -1
3      4
4      2
5      3
6     -4
7     14
8     -8
9     -5
10    -3
dtype: float64

3a: Location of changes:

This should already be done by the Index on the DataFrame or Series.

You should post specific examples of what your expected output from your example series is. If you want to do any of these row wise, you may have to transpose the DataFrame first.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM