I had one input dataset name data.csv the content is
id , name
1 , Jone/Elvis/Tom
2 , Elvis/Tonny
The name column use slash as seperator I need process the data.csv , my expected output is
id, Jone, Elvis, Tom, Toony
1, 1 , 1 , 1 , 0
2, 0 , 1 , 0 , 1
1 means column name had exists in name, 0 means not. How can I use python with pandas to transfer the input?
Let's use pandas and .str.get_dummies
with sep
parameter:
Read in dataframe from clipboard
df = pd.read_clipboard(sep='\s+\,\s+')
df
Input Dataframe:
id name
0 1 Jone/Elvis/Tom
1 2 Elvis/Tonny
Set index and use string accessor with get_dummies
:
df1 = df.set_index('id')
df1['name'].str.get_dummies(sep='/').reset_index()
Output:
id Elvis Jone Tom Tonny
0 1 1 1 1 0
1 2 1 0 0 1
import pandas as pd
data = pd.read_csv("./data.csv")
data["name"]= data["name"].str.split("/")
jone = [0, 0]
elvis = [0, 0]
tom = [0, 0]
tonny = [0, 0]
for i in data.index:
if any("Jone" in s for s in data.name[i]):
jone[i] = 1
else:
jone[i] = 0
for i in data.index:
if any("Elvis" in s for s in data.name[i]):
elvis[i] = 1
else:
elvis[i] = 0
for i in data.index:
if any("Tom" in s for s in data.name[i]):
tom[i] = 1
else:
tom[i] = 0
for i in data.index:
if any("Tonny" in s for s in data.name[i]):
tonny[i] = 1
else:
tonny[i] = 0
data['Jone'] = jone
data['Elvis'] = elvis
data['Tom'] = tom
data['Tonny'] = tonny
import pandas as pd;
df = pd.read_csv("test.csv")
def getDfIds(df):
ids = []
for i in df.index:
ids.append(df.iloc[i,0])
return ids
# create headers
def createHeaders(df,ids):
headers = []
for i in df.index:
names = (df.iloc[i,1]).split('/')
for index in range(len(names)):
headers.append(names[index].strip())
headers = list(set(headers))
headers.insert(0,"id")
return headers
# create body
def createBody(df,headers,ids):
# set default values 0
data = [[0 for i in range(len(headers))] for j in range(len(df.index))]
for i in df.index:
data[i][0] = ids[i]
names = (df.iloc[i,1]).split('/')
for ind in range(len(names)):
name = names[ind].strip()
inde = headers.index(name)
data[i][inde] = 1
return data
ids = getDfIds(df)
headers = createHeaders(df,ids)
body = createBody(df,headers,ids)
# create new data set
df = pd.DataFrame(body, columns = headers)
print df;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.