i have a data frame that contain multiple data in one column
ID classes
4 5792.0 Mechanics (any of the two sessions), Analysis ...
6 5915.0 Mechanics (any of the two sessions), Perspecti...
8 5931.0 CS, Chemistry, Perspecti...
9 5991.0 Perspective, CS
10 5841.0 Mechanics (any of the two sessions), Perspecti...
the "classes" column can have values such as "Perspective, Chemistry, CS" and so on i want to sort the above data frame based on values in the classes string
for example one output might be
ID classes
4 5792.0 CS
6 5915.0 CS
8 5931.0 CS
9 5991.0 CS
10 5841.0 CS
thus sorted based on classes attended,
I searched google, but all results were sorting data frame based on single value (single value per cell)
thanks in advance
Data
df=pd.DataFrame({'ID':[5792.0,5915.0,5931.0,5991.0,5841.0 ],'classes':['Mechanics (any of the two sessions), Perspecti,CS, Chemistry','Mechanics (any of the two sessions), Perspecti,CS, Chemistry','Mechanics (any of the two sessions), Perspecti,CS, Chemistry','Mechanics (any of the two sessions), Perspecti,CS, Chemistry','Mechanics (any of the two sessions), Perspecti,CS, Chemistry']})
df
Use df.str.split to convert string in classes
to list
df['classes']=df.classes.str.split(",")
Explode and sort by classes
df.explode('classes').sort_values(by='classes')
Outcome
I'm assuming you want to sort each string in the classes column split by commas, you can try this:
df['string'] = df['classes'].apply(lambda x: ' '.join(sorted(x.split(', '))))
print(df)
I'm sure there's a more "pandas" way of doing this, but here's one possible solution:
cs_df = df[df['classes'].apply(lambda x: 'CS' in x)]
cs_df['classes'] = 'CS'
In the above example, cs_df
should contain all of the rows from the original dataframe where 'CS'
is in classes
. You can do this manually or in a loop to get all your dataframes.
When we use df['classes']
, pandas returns a numpy.Series
instance. numpy.Series.apply
takes a function as an argument and applies it to every item in the series. For example:
# square each item in the series
s1 = numpy.Series([1,2,3]).apply(lambda x: x*x)
#[1,4,9]
# root each item in the series
s2 = s1.apply(sqrt)
#[1,2,3]
Pandas also has special indexing logic. If you use a Series
of bool
s as an index on a Dataframe
, pandas will return a Dataframe
where the items in the index series was True
.
So in this example
s = numpy.Series([True, False, True])
df2 = df[s]
df2
will only contain the rows 0
and 2
from the original df
.
Finally, pandas also has special logic for assignment by index as well. You can set entire columns to a single scalar value without needing an explit loop:
df['best_column'] = "best value"
In the above example, the dataframe will have a new column filled with values "best value"
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.