python 在列包含文本字符串時進行列計算

Question

我是 python 的新手。 我需要使用包含 text_strings 的 python 進行列計算。

例如：

import pandas as pd
data = [1,2,'s','s',5,6,7,8,'s']
df = pd.DataFrame(data)

我想通過使用.diff() 創建一個新列。 但是，它不能在 int 和 str 之間進行計算。

df.diff()    
TypeError: unsupported operand type(s) for -: 'str' and 'int'

新列應如下所示：

obs       new_col
    0   1  na
    1   2  1
    2   s  s
    3   s  s
    4   5  5
    5   6  1
    6   7  1
    7   8  1
    8   s  s

有誰知道如何做到這一點？ 謝謝！ JH

Answer 1

轉換成numeric后使用diff，然后找到diff和fillna，同樣因為我們知道只有第一個diff才能返回nan，硬編碼：

df['new_col'] = pd.to_numeric(df[0],errors='coerce').diff().fillna(df[0])
df.loc[0,'new_col'] = np.nan

print(df)

   0 new_col
0  1     NaN
1  2       1
2  s       s
3  s       s
4  5       5
5  6       1
6  7       1
7  8       1
8  s       s

Answer 2

你可以試試這個並修改這個

df['new_col']= df['obs'].shift(-1) #creating a sample column for the difference

def calc(x):
    if type(x["obs"])== int and type(x['new_col'])== int:
        return x['obs'] - x['obs']
    else:
        return x['obs']

a.apply(test, axis=1)

Answer 3

我已經創建了一個自定義 function 非常類似於 Pandas diff() function 在 Z23EEEB4347BDD26BDDZDA 的情況下使用相同的功能7。

import numpy as np

def diff(dataframe, col_name, new_col_name, periods=1):
    # List which stores the values of the new columns
    new_col_value = []
    
    # Previous Value in the column
    prev_value = None
    
    # Periods counts for skipping
    periods_count = 1
    
    # Looping through the specified column
    for i in range(len(dataframe[col_name])):
        
        # Conditional for skipping the rows
        if periods_count <= periods:
            new_col_value.append(np.nan)
            prev_value = dataframe[col_name][i]
            periods_count += 1
            
        # Conditional for checking the datatypes
        # If the datatype is int
        elif type(dataframe[col_name][i]) != str:
            # If the previous value is a string
            if (type(prev_value) == str):
                prev_value = dataframe[col_name][i]
                new_col_value.append(prev_value)
                
            # If the previous value is int
            else:
                new_col_value.append(dataframe[col_name][i] - prev_value)
                prev_value = dataframe[col_name][i]
        
        # If the value is of string datatype
        else:
            prev_value = dataframe[col_name][i]            
            new_col_value.append(prev_value)
    
    # Creating the new column in the dataframe
    dataframe[new_col_name] = new_col_value

python 在列包含文本字符串時進行列計算

問題描述

3 個解決方案

解決方案1
2 已采納 2021-05-23 16:48:41

解決方案2
1 2021-05-23 16:42:35

解決方案3
1 2021-05-23 16:51:16

python 在列包含文本字符串時進行列計算

問題描述

3 個解決方案

解決方案1 2 已采納 2021-05-23 16:48:41

解決方案2 1 2021-05-23 16:42:35

解決方案3 1 2021-05-23 16:51:16

解決方案1
2 已采納 2021-05-23 16:48:41

解決方案2
1 2021-05-23 16:42:35

解決方案3
1 2021-05-23 16:51:16