[英]how to convert categorical data to numerical data in for loop in python pandas
I have a categorical data framework and I want to convert it into numerical data, I have more than 50 columns so I want to run.repalce command in a loop.我有一个分类数据框架,我想将其转换为数值数据,我有超过 50 列,所以我想在循环中运行 run.repalce 命令。
replace_map = {'w': 4, '+': 5, '.': 6, 'g': 7}
and I have written code which iterates over columns我已经编写了迭代列的代码
for column in df1_replace.columns[1:76]:
# Select column contents by column name using [] operator
columnSeriesObj = df1_replace[column]
print('Colunm Name : ', column)
print('Column Contents : ', columnSeriesObj.values)
Here is how you could do it using dropna()
and drop_duplicated()
I have used my own sample data with one column with no values.以下是使用
dropna()
和drop_duplicated()
的方法我使用了我自己的示例数据,其中一列没有值。
import pandas as pd
from io import StringIO
csv = StringIO('''2001,1,,a,a
2001,2,,b,b
2001,3,,c,c
2005,1,,a,a
2005,1,,c,c''')
df = pd.read_csv(csv, header=None )
print(df)
df
will look like this df
看起来像这样
0 1 2 3 4
0 2001 1 NaN a a
1 2001 2 NaN b b
2 2001 3 NaN c c
3 2005 1 NaN a a
4 2005 1 NaN c c
Then drop all columns ( how='all'
) where all values are na(NaN)
然后删除所有值为
na(NaN)
的所有列 ( how='all'
)
df_new = df.dropna(how='all', axis=1)
Take a transpose of the dataframe, the duplicate columns will become duplicate rows.对 dataframe 进行转置,重复的列将变为重复的行。 Then use
drop_duplicates
on it to drop duplicate rows.然后在其上使用
drop_duplicates
删除重复的行。 Transpose it back to get your original data, without empty columns and duplicate columns.将其转回以获取原始数据,没有空列和重复列。
df_new = df_new.T.drop_duplicates().T
df_new.columns = range(len(df_new.columns))
print(df_new)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.