[英]Rename columns using part of the string in Pandas
I have a data frame that looks like below.我有一个如下所示的数据框。 The actual data frame has 64 columns.
实际数据框有 64 列。
0 1 2
app 2 tb 1 mt 3
app 0 tb 5 mt 2
app 0 tb 0 mt 6
I'd like to rename the columns using the substring (eg "app","tb").我想使用 substring 重命名列(例如“app”、“tb”)。 The ideal data frame would look like below:
理想的数据框如下所示:
app tb mt
2 1 3
0 5 2
0 0 6
I know how to subset to the numeric values using str.split()
.我知道如何使用
str.split()
对数值进行子集化。 However, how do I update the corresponding column using the first part of the string?但是,如何使用字符串的第一部分更新相应的列?
You can assign to .columns
to rename the columns of dataframe.您可以分配给
.columns
以重命名 dataframe 的列。 For example:例如:
df.columns = df.iloc[0, :].str.extract(r"^(.*)\s+")[0]
df = df.apply(lambda x: x.str.replace(r"^(.*\s+)", ""))
print(df)
Prints:印刷:
app tb mt
0 2 1 3
1 0 5 2
2 0 0 6
A way to do this would be to use the.column method for a pandas dataframe.一种方法是对 pandas dataframe 使用 .column 方法。
Assuming that all your df values are consistent and you want the first part of that string as a column name for all your 64 columns, you can do this:假设您的所有 df 值都是一致的,并且您希望该字符串的第一部分作为所有 64 列的列名,您可以这样做:
df.columns = [x.split()[0] for x in df.loc[0, :]]
df = df.apply(lambda x: x.str.replace(r"^(.*\s+)", ""))
Which essentially makes use of a list comprehension (a more pythonic loop) and a string split method in order to manipulate the first-row values in your df.它本质上利用了一个列表理解(一个更 Pythonic 的循环)和一个字符串拆分方法来操作你的 df 中的第一行值。 Now, if you print df.head(), you show see:
现在,如果你打印 df.head(),你会看到:
app tb mt
0 2 1 3
1 0 5 2
2 0 0 6
You could reshape the data with melt
before pulling out the strings:在拉出字符串之前,您可以使用
melt
重塑数据:
# flip the column names into rows
(df.melt(ignore_index = False)
.drop(columns = 'variable')
# split the column into strings and number
.loc[:, 'value'].str.split(expand=True)
# flip the dataframe to get the headers
.pivot(columns=0, values=1)
.rename_axis(columns = None)
)
app mt tb
0 2 3 1
1 0 2 5
2 0 6 0
A shorter route, with inspiration from @AndrejKesely, would be to use the string functions on the dataframe itself;受@AndrejKesely 的启发,一条较短的路线是在 dataframe 本身上使用字符串函数; this should be faster:
这应该更快:
Get the columns:获取列:
df.columns = df.iloc[0].str.split().str[0]
Remove the column names from each column:从每列中删除列名:
df.transform(lambda df: df.str.split().str[-1]).rename_axis(columns = None)
app tb mt
0 2 1 3
1 0 5 2
2 0 0 6
To keep it as one fun method chaining solution:将其作为一种有趣的方法链接解决方案:
new_df = (
df.set_axis(
df.loc[0, :].str.extract("^(.+)\s+", expand=False).tolist(), axis=1
)
.replace(regex="^(.+\s+)", value="")
)
print(new_df)
app tb mt
0 2 1 3
1 0 5 2
2 0 0 6
Let us chain the function of stack
and unstack
让我们链接
stack
和取消堆栈的unstack
out = df.stack().str.split(' ',expand=True).set_index(0,append=True)[1].reset_index(level=1,drop=True).unstack(level=-1)
0 app mt tb
0 2 3 1
1 0 2 5
2 0 6 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.