簡體   English   中英

為 pandas dataframe 中的每一行應用 linspace

[英]Applying linspace for every row in a pandas dataframe

例如,如果我有一個 dataframe 像這樣說 df

                     date     number_range  id
0 [2010-01-01, 2010-03-01]    [5, 10]       1
1 [2010-02-01, 2010-06-01]    [1, 3]        1
2 [2010-07-01, 2010-11-01]    [12-50]       1

我想通過查找日期差異然后將 linspace 應用於所有行來將numpy.linspace應用於上述內容。 例如,第 0 行的日期差異為 2,應用 linspace(5,10,2),第 1 行的差異為 4,應用 linspace(1,3,4)。

final result
-------------
                     date     number_range  id   linspace
0 [2010-01-01, 2010-03-01]    [5, 10]       1    [5, 10]
1 [2010-02-01, 2010-06-01]    [1, 3]        1    [1, 1.66667, 2.3333, 3]
2 [2010-07-01, 2010-12-01]    [12-50]       1    [12, 21.5, 31, 40.5, 50]

我試過df.apply(lambda row: np.linspace(row['start_value'], row['end value'], row['diff'])但我一直收到類型錯誤,說 'Series' object不能被解釋為 integer...我已經嘗試執行 diff.astype(int) 並出現相同的錯誤...不確定從那里到 go 的位置。

確保您了解row是什么:

In [133]: def foo(row):
     ...:     print(row)
     ...: 
In [134]: df.apply(foo)
0    [2010-01-01, 2010-03-01]
1    [2010-02-01, 2010-06-01]
2    [2010-07-01, 2010-11-01]
Name: date, dtype: object
0     [5, 10]
1      [1, 3]
2    [12, 50]
Name: number_range, dtype: object
0    1
1    1
2    1
Name: id, dtype: int64
Out[134]: 
date            None
number_range    None
id              None
dtype: object

In [136]: def foo(row):
     ...:     print(row['start_value'], row['end_value'], row['diff'])
     ...: 
In [137]: df.apply(foo)
Traceback (most recent call last): 
...
KeyError: 'start_value'

在另一個答案中建議使用axis=1

In [148]: def foo(row):
     ...:     print(type(row))
     ...:     print(row)
     ...:     print(row['date'])
     ...:     print(row['number_range'])

In [149]: df.apply(foo, axis=1)
<class 'pandas.core.series.Series'>
date            [2010-01-01, 2010-03-01]
number_range                     [5, 10]
id                                     1
Name: 0, dtype: object
['2010-01-01', '2010-03-01']
[5, 10]
<class 'pandas.core.series.Series'>
... 
['2010-02-01', '2010-06-01']
[1, 3]
<class 'pandas.core.series.Series'>
....
['2010-07-01', '2010-11-01']
[12, 50]

現在我們可以從number_range中拉出端點:

In [150]: def foo(row):
     ...:     nr = row['number_range']
     ...:     return np.linspace(nr[0],nr[1],3)
     ...: 
In [151]: df.apply(foo, axis=1)
Out[151]: 
0      [5.0, 7.5, 10.0]
1       [1.0, 2.0, 3.0]
2    [12.0, 31.0, 50.0]
dtype: object

我可以用一個linspace生成相同的數字:

In [159]: df['number_range'].to_numpy()
Out[159]: array([list([5, 10]), list([1, 3]), list([12, 50])], dtype=object)
In [160]: nr = np.stack(df['number_range'].to_numpy())
In [161]: nr
Out[161]: 
array([[ 5, 10],
       [ 1,  3],
       [12, 50]])
In [162]: np.linspace(nr[:,0],nr[:,1],3).T
Out[162]: 
array([[ 5. ,  7.5, 10. ],
       [ 1. ,  2. ,  3. ],
       [12. , 31. , 50. ]])

我選擇3用於所有行; 我沒有試圖弄清楚你從哪里得到 2、4 和 5。

您可以將apply()axis=1一起使用,如下所示:( (if diff = [2,4,5])

df['linspace'] = df.apply(lambda x: np.round(
    np.linspace(x['number_range'][0],  x['number_range'][1], x['diff']),3), 
                          axis=1)
print(df)

或者首先,您可以創建start_valueend_value作為您的問題,然后像下面這樣創建linspace

df[['start_value','end_value']] = pd.DataFrame(df['number_range'].to_list())
df['linspace'] = df.apply(lambda x: np.round(
    np.linspace(x['start_value'],  x['end_value'], x['diff']),3), axis=1)
print(df)

Output:

                       date number_range  diff                        linspace
0  [2010-01-01, 2010-03-01]      [5, 10]     2                     [5.0, 10.0]
1  [2010-02-01, 2010-06-01]       [1, 3]     4        [1.0, 1.667, 2.333, 3.0]
2  [2010-07-01, 2010-11-01]     [12, 50]     5  [12.0, 21.5, 31.0, 40.5, 50.0]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM