Pandas：将转换应用于所有字符列

Question

我正在使用 Python 尝试对 pandas 数据框中的所有字符/字符串列进行一些转换。 变换是：

将所有内容设为大写
修剪空白区域

我来自 R 背景，这可以通过类似的方式来实现


mydf <- mydf %>% 
  dplyr::mutate_if(is.character, toupper)
  dplyr::mutate_if(is.character, trimws)

对于 Python，我不知所措。 我已经尝试过以下方法，它首先标识所有字符列，然后尝试修剪空格并使所有字符列大写（在这种情况下为物种）

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris

# Create a sample dataset
iris = load_iris()

df= pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                 columns= iris['feature_names'] + ['target'])

df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

# Make character columns upper case and then trim the white space
string_dtypes = df.convert_dtypes().select_dtypes("string")
df[string_dtypes.columns] = string_dtypes.apply(lambda x: x.str.upper())
df[string_dtypes.columns] = string_dtypes.apply(lambda x: x.str.strip())

df

我很欣赏这可能是一个非常基本的问题，并提前感谢任何花时间提供帮助的人

Answer 1

您应该能够通过方法链接在一行中做到这一点：

df.astype(str).apply(lambda x: x.str.upper().str.strip())

输出：

    sepal length (cm)   sepal width (cm)    petal length (cm)   petal width (cm)    target  species
0   5.1 3.5 1.4 0.2 0.0 SETOSA
1   4.9 3.0 1.4 0.2 0.0 SETOSA
2   4.7 3.2 1.3 0.2 0.0 SETOSA
3   4.6 3.1 1.5 0.2 0.0 SETOSA
4   5.0 3.6 1.4 0.2 0.0 SETOSA
... ... ... ... ... ... ...
145 6.7 3.0 5.2 2.3 2.0 VIRGINICA
146 6.3 2.5 5.0 1.9 2.0 VIRGINICA
147 6.5 3.0 5.2 2.0 2.0 VIRGINICA
148 6.2 3.4 5.4 2.3 2.0 VIRGINICA
149 5.9 3.0 5.1 1.8 2.0 VIRGINICA

Pandas：将转换应用于所有字符列

问题描述

1 个解决方案

解决方案1
4 已采纳 2022-07-20 12:05:25

Pandas：将转换应用于所有字符列

问题描述

1 个解决方案

解决方案1 4 已采纳 2022-07-20 12:05:25

解决方案1
4 已采纳 2022-07-20 12:05:25