使用 Pandas 将整个 dataframe 从小写转换为大写

Question

我有一个 dataframe，如下所示：

# Create an example dataframe about a fictional army
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks'],
            'company': ['1st', '1st', '2nd', '2nd'],
            'deaths': ['kkk', 52, '25', 616],
            'battles': [5, '42', 2, 2],
            'size': ['l', 'll', 'l', 'm']}
df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'deaths', 'battles', 'size'])

我的目标是将 dataframe 中的每个字符串都转换为大写，如下所示：

注意：所有数据类型都是对象，不得更改； output 必须包含所有对象。 我想避免将每一列都一一转换......我想在整个 dataframe 上进行一般操作。

到目前为止我尝试的是这样做但没有成功

df.str.upper()

Answer 1

astype()会将每个系列转换为dtype对象（字符串），然后在转换后的系列上调用str()方法以逐字获取字符串并在其上调用函数upper() 。 请注意，在此之后，所有列的 dtype 都会更改为 object。

In [17]: df
Out[17]: 
     regiment company deaths battles size
0  Nighthawks     1st    kkk       5    l
1  Nighthawks     1st     52      42   ll
2  Nighthawks     2nd     25       2    l
3  Nighthawks     2nd    616       2    m

In [18]: df.apply(lambda x: x.astype(str).str.upper())
Out[18]: 
     regiment company deaths battles size
0  NIGHTHAWKS     1ST    KKK       5    L
1  NIGHTHAWKS     1ST     52      42   LL
2  NIGHTHAWKS     2ND     25       2    L
3  NIGHTHAWKS     2ND    616       2    M

您可以稍后使用to_numeric()再次将 'battles' 列转换为数字：

In [42]: df2 = df.apply(lambda x: x.astype(str).str.upper())

In [43]: df2['battles'] = pd.to_numeric(df2['battles'])

In [44]: df2
Out[44]: 
     regiment company deaths  battles size
0  NIGHTHAWKS     1ST    KKK        5    L
1  NIGHTHAWKS     1ST     52       42   LL
2  NIGHTHAWKS     2ND     25        2    L
3  NIGHTHAWKS     2ND    616        2    M

In [45]: df2.dtypes
Out[45]: 
regiment    object
company     object
deaths      object
battles      int64
size        object
dtype: object

Answer 2

这可以通过以下applymap方法解决：

df = df.applymap(lambda s: s.lower() if type(s) == str else s)

Answer 3

循环非常慢，而不是对一行中的每个和单元格使用应用函数，尝试获取列表中的列名称，然后遍历列列表以将每列文本转换为小写。

下面的代码是比应用函数更快的向量操作。

for columns in dataset.columns:
    dataset[columns] = dataset[columns].str.lower()

Answer 4

由于str仅适用于系列，您可以将其单独应用于每一列，然后连接：

In [6]: pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1)
Out[6]: 
     regiment company deaths battles size
0  NIGHTHAWKS     1ST    KKK       5    L
1  NIGHTHAWKS     1ST     52      42   LL
2  NIGHTHAWKS     2ND     25       2    L
3  NIGHTHAWKS     2ND    616       2    M

编辑：性能比较

In [10]: %timeit df.apply(lambda x: x.astype(str).str.upper())
100 loops, best of 3: 3.32 ms per loop

In [11]: %timeit pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1)
100 loops, best of 3: 3.32 ms per loop

两个答案在小数据帧上的表现相同。

In [15]: df = pd.concat(10000 * [df])

In [16]: %timeit pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1)
10 loops, best of 3: 104 ms per loop

In [17]: %timeit df.apply(lambda x: x.astype(str).str.upper())
10 loops, best of 3: 130 ms per loop

在大型数据帧上，我的答案略快。

Answer 5

试试这个

df2 = df2.apply(lambda x: x.str.upper() if x.dtype == "object" else x)

Answer 6

如果你想保留 de dtype 使用isinstance(obj,type)

df.apply(lambda x: x.str.upper().str.strip() if isinstance(x, object) else x)

Answer 7

如果您想要保存 dtype 或仅更改一种类型.. 尝试如果：

for x in dados.columns:
if dados[x].dtype == 'int32':
    print('int32 - not allow upper')
else:
    print('Object - allow upper')
    dados[x] = dados[x].str.upper()

Answer 8

您可以将它应用于每个列...

oh_df.columns = map(str.lower, oh_df.columns)

Answer 9

尝试：

df.columns = df.columns.str.upper()

使用 Pandas 将整个 dataframe 从小写转换为大写

问题描述

8 个解决方案

解决方案1
57 已采纳 2016-09-15 13:19:39

解决方案2
33 2018-04-29 06:35:20

解决方案3
9 2019-11-29 07:33:45

解决方案4
8 2016-09-15 13:23:52

解决方案5
5 2019-02-20 14:26:03

解决方案6
2 2018-05-11 19:55:04

解决方案7
0 2022-09-16 20:02:03

解决方案8
0 2022-12-12 17:59:50

解决方案9
-1 2021-07-28 16:24:11

使用 Pandas 将整个 dataframe 从小写转换为大写

问题描述

8 个解决方案

解决方案1 57 已采纳 2016-09-15 13:19:39

解决方案2 33 2018-04-29 06:35:20

解决方案3 9 2019-11-29 07:33:45

解决方案4 8 2016-09-15 13:23:52

解决方案5 5 2019-02-20 14:26:03

解决方案6 2 2018-05-11 19:55:04

解决方案7 0 2022-09-16 20:02:03

解决方案8 0 2022-12-12 17:59:50

解决方案9 -1 2021-07-28 16:24:11

解决方案1
57 已采纳 2016-09-15 13:19:39

解决方案2
33 2018-04-29 06:35:20

解决方案3
9 2019-11-29 07:33:45

解决方案4
8 2016-09-15 13:23:52

解决方案5
5 2019-02-20 14:26:03

解决方案6
2 2018-05-11 19:55:04

解决方案7
0 2022-09-16 20:02:03

解决方案8
0 2022-12-12 17:59:50

解决方案9
-1 2021-07-28 16:24:11