简体   繁体   English

用熊猫在一列中减去每两行

[英]substract each two row in one column with pandas

I found this problem bellow while executing the code bellow on google colab it works normaly我在google colab上执行下面的代码时发现了这个问题,它可以正常工作

df['temps'] = df['temps'].view(int).div(1e9).diff().fillna(0).abs()
print(df)

but while using jupyter notebook localy the error bellow appears但是在本地使用 jupyter notebook 时出现以下错误

ValueError                                Traceback (most recent call last)
Input In [13], in <cell line: 1>()
----> 1 df3['rebounds'] = pd.Series(df3['temps'].view(int).div(1e9).diff().fillna(0))

File C:\Python310\lib\site-packages\pandas\core\series.py:818, in Series.view(self, dtype)
    815 # self.array instead of self._values so we piggyback on PandasArray
    816 #  implementation
    817 res_values = self.array.view(dtype)
--> 818 res_ser = self._constructor(res_values, index=self.index)
    819 return res_ser.__finalize__(self, method="view")

File C:\Python310\lib\site-packages\pandas\core\series.py:442, in Series.__init__(self, data, index, dtype, name, copy, fastpath)
    440     index = default_index(len(data))
    441 elif is_list_like(data):
--> 442     com.require_length_match(data, index)
    444 # create/copy the manager
    445 if isinstance(data, (SingleBlockManager, SingleArrayManager)):

File C:\Python310\lib\site-packages\pandas\core\common.py:557, in require_length_match(data, index)
    553 """
    554 Check the length of data matches the length of the index.
    555 """
    556 if len(data) != len(index):
--> 557     raise ValueError(
    558         "Length of values "
    559         f"({len(data)}) "
    560         "does not match length of index "
    561         f"({len(index)})"
    562     )

ValueError: Length of values (830) does not match length of index (415)

any suggetions to resolve this !!任何解决此问题的建议!

Here are two ways to get this to work:这里有两种方法可以让它工作:

df3['rebounds'] = pd.Series(df3['temps'].view('int64').diff().fillna(0).div(1e9))

... or: ... 或者:

df3['rebounds'] = pd.Series(df3['temps'].astype('int64').diff().fillna(0).div(1e9))

For the following sample input:对于以下示例输入:

df3.dtypes:

temps    datetime64[ns]
dtype: object

df3:

       temps
0 2022-01-01
1 2022-01-02
2 2022-01-03

... both of the above code samples give this output: ...上述两个代码示例都给出了以下输出:

df3.dtypes:

temps       datetime64[ns]
rebounds           float64
dtype: object

df3:

       temps  rebounds
0 2022-01-01       0.0
1 2022-01-02   86400.0
2 2022-01-03   86400.0

The issue is probably that view() essentially reinterprets the raw data of the existing series as a different data type.问题可能在于view()本质上将现有系列的原始数据重新解释为不同的数据类型。 For this to work, according to the Series.view() docs (see also the numpy.ndarray.view() docs ) the data types must have the same number of bytes.为此,根据Series.view() 文档(另请参见numpy.ndarray.view() 文档),数据类型必须具有相同的字节数。 Since the original data is datetime64 , your code specifying int as the argument to view() may not have met this requirement.由于原始数据是datetime64 ,您将int指定为 view() 的参数的代码可能不满足此要求。 Explicitly specifying int64 should meet it.明确指定int64应该满足它。 Or, using astype() instead of view() with int64 will also work.或者,使用astype()而不是view()和 int64 也可以。

As to why this works in colab and not in jupyter notebook, I can't say.至于为什么这在 colab 而不是 jupyter notebook 中有效,我不能说。 Perhaps they are using different versions of pandas and numpy which treat int differently.也许他们正在使用不同版本的 pandas 和 numpy,它们对int的处理方式不同。

I do know that in my environment, if I try the following:我确实知道在我的环境中,如果我尝试以下操作:

df3['rebounds'] = pd.Series(df3['temps'].astype('int').diff().fillna(0).div(1e9))

... then I get this error: ...然后我收到此错误:

TypeError: cannot astype a datetimelike from [datetime64[ns]] to [int32]

This suggests that int means int32 .这表明int意味着int32 It would be interesting to see if this works on colab.看看这是否适用于 colab 会很有趣。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 熊猫:在意甲的每一行中减去相同的数字 - Pandas : substract the same number in each row of a Serie 从第二个 Pandas Pandas dataframe 中减去 groupby 后的一个日期时间列,每个组的时间参考 - Substract one datetime column after a groupby with a time reference for each group from a second Pandas dataframe Pandas 减去上面的行 - Pandas substract above row 熊猫:有一个datetime列,需要将每个单元格减去4秒 - Pandas: have a datetime column, need to substract 4 seconds to each cell 将两个 pandas 数据帧与每个唯一列组合并保留行索引(Python) - Combine two pandas DataFrames with one unique column each and keep the row index (Python) Pandas Dataframe:根据列条件减除两行值 - Pandas Dataframe: substract and divided two rows value based on column condition Pandas / Pyspark for 循环列减法 - Pandas / Pyspark for loop column substract 熊猫,每行获取两列之间最大列的值 - Pandas, for each row getting value of largest column between two columns 如何将 function 应用于 pandas dataframe 中一列的每一行? - How to apply a function to each row of one column in a pandas dataframe? 如果每列每行有多个值,如何在熊猫数据框中的两列之间创建字典? - How can I create a dictionary between two columns within a pandas dataframe if each column has more than one value per row?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM