[英]How to transform python data frame such that unique row values are transposed to columns and values of another column become their rows
Summary概括
I am using Python 2.7.我正在使用 Python 2.7。 I have a data frame with all categorical variables ie data type is string.我有一个包含所有分类变量的数据框,即数据类型是字符串。 I would like to transform unique row values of one column into multiple columns.我想将一列的唯一行值转换为多列。 Additionally, the values of those resulting columns must have the corresponding values from another column.此外,这些结果列的值必须具有来自另一列的相应值。 To describe in detail, I have provided a reproducible data frame and expected output for your reference.为了详细描述,我提供了一个可重现的数据帧和预期的 output 供您参考。
Dataframe that needs transposing can be created as follows:需要转置的 Dataframe 可以创建如下:
import pandas as pd
codes = ['codeA','codeB', 'codeC']
variables = ['textA','textA','textB']
dataset = list(zip(codes,variables))
df = pd.DataFrame(data = dataset, columns=['codes','variables'])
df['string'] = 'string1'
The data frame that needs transposing looks like this:需要转置的数据框如下所示:
df
codes variables string
0 codeA textA string1
1 codeB textA string1
2 codeC textB string1
The expected final output should like this:预期的最终 output 应该是这样的:
textA textB string
codeA string1
codeB
codeC string1
Note: The objective is transposition.注意:目标是转置。 I am not overly concerned whether the blank spaces are NULL values or zeroes.我不太担心空格是 NULL 值还是零。
Im not sure about the last column in your example as it seems inconsistent with the rest of the transformation.我不确定您示例中的最后一列,因为它似乎与转换的 rest 不一致。 In any ways, I think converting the variable column using pandas get_dummies
function is probably a good place to start.无论如何,我认为使用 pandas get_dummies
function 转换变量列可能是一个不错的起点。
import pandas as pd
codes = ['codeA','codeB', 'codeC']
variables = ['textA','textA','textB']
dataset = list(zip(codes,variables))
df = pd.DataFrame(data = dataset, columns=['codes','variables'])
df['string'] = 'string1'
df = pd.get_dummies(df, columns=['variables'])
df.variables_textA = df.codes.where(df.variables_textA.astype(bool),0)
df.variables_textB = df.codes.where(df.variables_textB.astype(bool),0)
columns = ['variables_textA', 'variables_textB','string']
df = df[columns]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.