Summary
I am using Python 2.7. I have a data frame with all categorical variables ie data type is string. I would like to transform unique row values of one column into multiple columns. Additionally, the values of those resulting columns must have the corresponding values from another column. To describe in detail, I have provided a reproducible data frame and expected output for your reference.
Dataframe that needs transposing can be created as follows:
import pandas as pd
codes = ['codeA','codeB', 'codeC']
variables = ['textA','textA','textB']
dataset = list(zip(codes,variables))
df = pd.DataFrame(data = dataset, columns=['codes','variables'])
df['string'] = 'string1'
The data frame that needs transposing looks like this:
df
codes variables string
0 codeA textA string1
1 codeB textA string1
2 codeC textB string1
The expected final output should like this:
textA textB string
codeA string1
codeB
codeC string1
Note: The objective is transposition. I am not overly concerned whether the blank spaces are NULL values or zeroes.
Im not sure about the last column in your example as it seems inconsistent with the rest of the transformation. In any ways, I think converting the variable column using pandas get_dummies
function is probably a good place to start.
import pandas as pd
codes = ['codeA','codeB', 'codeC']
variables = ['textA','textA','textB']
dataset = list(zip(codes,variables))
df = pd.DataFrame(data = dataset, columns=['codes','variables'])
df['string'] = 'string1'
df = pd.get_dummies(df, columns=['variables'])
df.variables_textA = df.codes.where(df.variables_textA.astype(bool),0)
df.variables_textB = df.codes.where(df.variables_textB.astype(bool),0)
columns = ['variables_textA', 'variables_textB','string']
df = df[columns]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.