I have my data -
data = [['abc - a', 'A'], ['def - b', 'B'], ['ghi - c', 'C'], ['jkl - d', 'D']]
df = pd.DataFrame(data, columns = ['names', 'category'])
df
names category
abc - a A
def - b B
ghi - c C
jkl - d D
What I want as my output is -
names division category
abc a A
def b B
ghi c C
jkl d D
There are a lot of methods to perform this, but I want to perform this with this logic -
iterate through each rows of column-names, and store each value in 'st1' and then ->
first, middle, last = st1.partition(' - ')
df['names'] = first
df['division'] = last
and also assigning it to dataframe one by one, please help me to get my desired output in python.
你可以这样做:
df[['names','division']] = df.names.str.split(' - ',expand=True)
Create the dataframe as you did before, then iterate over all rows of names and categories and split the names through -
s and append them to a new dataset which is then converted into another DataFrame like this:
import pandas as pd
data = [['abc - a', 'A'], ['def - b', 'B'], ['ghi - c', 'C'], ['jkl - d', 'D']]
df = pd.DataFrame(data, columns = ['names', 'category'])
newdata = []
for names, category in zip(df.names, df.category):
name, division = names.split("-")
newdata.append([name.strip(), division.strip(), category])
new_df = pd.DataFrame(newdata, columns = ['names', 'division', 'category'])
print
ing the new dataframe results in:
>>> new_df
names division category
0 abc a A
1 def b B
2 ghi c C
3 jkl d D
I'm testing out github copilot to see how it can solve stackoverflow issues.
# Solution 1
import pandas as pd
import numpy as np
data = [['abc - a', 'A'], ['def - b', 'B'], ['ghi - c', 'C'], ['jkl - d', 'D']]
df = pd.DataFrame(data, columns=['names', 'category'])
# Iterate through each rows of column-names, and store each value in 'st1' and then ->
# first, middle, last = st1.partition(' - ')
# df['names'] = first
# df['division'] = last
# and also assigning it to dataframe one by one, please help me to get my desired output in python.
for index, row in df.iterrows():
st1 = row['names']
first, middle, last = st1.partition(' - ')
df.loc[index, 'names'] = first
df.loc[index, 'division'] = last
# Explain what is df.loc
# df.loc[row index, column index]
# df.loc[0, 'names'] = first
# df.loc[0, 'division'] = last
print(df)
Output:
names category division
0 abc A a
1 def B b
2 ghi C c
3 jkl D d
Since you want to iterate through each of the rows in your DataFrame
and work with them individually, you'll need to use some NumPy to get your work done. Since you want to split the rows, .partition()
works similarly to .split()
in Pandas, but not in NumPy.
Here are the packages you'll need:
import pandas as pd
import numpy as np
Before you can iterate through your rows, you'll need to use .insert()
to create a new column named "division" (I use np.nan
as a place filler, but you can use any value you want:
df.insert(1, 'division', np.nan)
Now you can iterate through the rows using Pandas' iterrows()
method.
# index returns the index number, row returns a tuple of the row values
for index, row in df.iterrows():
# convert row values from a tuple to a row
row = list(row)
# remove 'np.nan' value from the column we created above
row.pop(1)
# split value from the 'names' column; creates values for 'names' and 'division' columns
new_row = row[0].split(' - ')
# append the value from the 'category' column
new_row = np.append(new_row, row[1])
# save the new row to the DataFrame
df.iloc[index] = new_row
This is the output:
| | names | division | category |
|---:|:--------|:-----------|:-----------|
| 0 | abc | a | A |
| 1 | def | b | B |
| 2 | ghi | c | C |
| 3 | jkl | d | D |
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.