使用 Pandas 中的部分字符串重命名列

Question

I have a data frame that looks like below.我有一个如下所示的数据框。 The actual data frame has 64 columns.实际数据框有 64 列。

  0      1      2 
app 2  tb 1   mt 3
app 0  tb 5   mt 2
app 0  tb 0   mt 6

I'd like to rename the columns using the substring (eg "app","tb").我想使用 substring 重命名列（例如“app”、“tb”）。 The ideal data frame would look like below:理想的数据框如下所示：

I know how to subset to the numeric values using str.split() .我知道如何使用str.split()对数值进行子集化。 However, how do I update the corresponding column using the first part of the string?但是，如何使用字符串的第一部分更新相应的列？

Answer 1

You can assign to .columns to rename the columns of dataframe.您可以分配给.columns以重命名 dataframe 的列。 For example:例如：

df.columns = df.iloc[0, :].str.extract(r"^(.*)\s+")[0]
df = df.apply(lambda x: x.str.replace(r"^(.*\s+)", ""))

print(df)

Prints:印刷：

  app tb mt
0   2  1  3
1   0  5  2
2   0  0  6

Answer 2

A way to do this would be to use the.column method for a pandas dataframe.一种方法是对 pandas dataframe 使用 .column 方法。

Assuming that all your df values are consistent and you want the first part of that string as a column name for all your 64 columns, you can do this:假设您的所有 df 值都是一致的，并且您希望该字符串的第一部分作为所有 64 列的列名，您可以这样做：

df.columns = [x.split()[0] for x in df.loc[0, :]]
df = df.apply(lambda x: x.str.replace(r"^(.*\s+)", ""))

Which essentially makes use of a list comprehension (a more pythonic loop) and a string split method in order to manipulate the first-row values in your df.它本质上利用了一个列表理解（一个更 Pythonic 的循环）和一个字符串拆分方法来操作你的 df 中的第一行值。 Now, if you print df.head(), you show see:现在，如果你打印 df.head()，你会看到：

    app     tb      mt
0   2       1       3
1   0       5       2
2   0       0       6

Answer 3

You could reshape the data with melt before pulling out the strings:在拉出字符串之前，您可以使用melt重塑数据：

   # flip the column names into rows
  (df.melt(ignore_index = False)
    .drop(columns = 'variable')
    # split the column into strings and number
    .loc[:, 'value'].str.split(expand=True)
    # flip the dataframe to get the headers
    .pivot(columns=0, values=1)
    .rename_axis(columns = None)
 )

  app mt tb
0   2  3  1
1   0  2  5
2   0  6  0

A shorter route, with inspiration from @AndrejKesely, would be to use the string functions on the dataframe itself;受@AndrejKesely 的启发，一条较短的路线是在 dataframe 本身上使用字符串函数； this should be faster:这应该更快：

Get the columns:获取列：

df.columns = df.iloc[0].str.split().str[0]

Remove the column names from each column:从每列中删除列名：

df.transform(lambda df: df.str.split().str[-1]).rename_axis(columns = None)

  app tb mt
0   2  1  3
1   0  5  2
2   0  0  6

Answer 4

To keep it as one fun method chaining solution:将其作为一种有趣的方法链接解决方案：

new_df = (
    df.set_axis(
        df.loc[0, :].str.extract("^(.+)\s+", expand=False).tolist(), axis=1
    )
    .replace(regex="^(.+\s+)", value="")
)

print(new_df)
  app tb mt
0   2  1  3
1   0  5  2
2   0  0  6

Answer 5

Let us chain the function of stack and unstack让我们链接stack和取消堆栈的unstack

out = df.stack().str.split(' ',expand=True).set_index(0,append=True)[1].reset_index(level=1,drop=True).unstack(level=-1)
0 app mt tb
0   2  3  1
1   0  2  5
2   0  6  0

使用 Pandas 中的部分字符串重命名列

问题描述

5 个解决方案

解决方案1
2 已采纳 2021-04-22 00:25:34

解决方案2
0 2021-04-22 00:28:10

解决方案3
0 2021-04-22 00:31:42

解决方案4
0 2021-04-22 00:59:22

解决方案5
0 2021-04-22 01:05:02

使用 Pandas 中的部分字符串重命名列

问题描述

5 个解决方案

解决方案1 2 已采纳 2021-04-22 00:25:34

解决方案2 0 2021-04-22 00:28:10

解决方案3 0 2021-04-22 00:31:42

解决方案4 0 2021-04-22 00:59:22

解决方案5 0 2021-04-22 01:05:02

解决方案1
2 已采纳 2021-04-22 00:25:34

解决方案2
0 2021-04-22 00:28:10

解决方案3
0 2021-04-22 00:31:42

解决方案4
0 2021-04-22 00:59:22

解决方案5
0 2021-04-22 01:05:02