簡體   English   中英

在 Pandas 中對包含字符串的列進行排序

[英]Sort a column containing string in Pandas

我是 Pandas 的新手,希望對包含字符串的列進行排序並生成一個數值來唯一標識該字符串。 我的數據框看起來像這樣:

df = pd.DataFrame({'key': range(8), 'year_week': ['2015_10', '2015_1', '2015_11', '2016_9', '2016_10','2016_3', '2016_9', '2016_10']})

首先我喜歡將'year_week'列按升序排列(2015_1, 2016_9, '2016_9', 2016_10, 2016_11, 2016_3, 2016_10, 2016_10)然后為每個唯一'year_week'字符串'year_week'生成一個數值。

您可以先將to_datetime列轉換to_datetime year_week ,然后按sort_values對其進行sort_values ,最后使用factorize

df = pd.DataFrame({'key': range(8), 'year_week': ['2015_10', '2015_1', '2015_11', '2016_9', '2016_10','2016_3', '2016_9', '2016_10']})

#http://stackoverflow.com/a/17087427/2901002
df['date'] = pd.to_datetime(df.year_week + '-0', format='%Y_%W-%w')
#sort by column date
df.sort_values('date', inplace=True)
#create numerical values
df['num'] = pd.factorize(df.year_week)[0]
print (df)
   key year_week       date  num
1    1    2015_1 2015-01-11    0
0    0   2015_10 2015-03-15    1
2    2   2015_11 2015-03-22    2
5    5    2016_3 2016-01-24    3
3    3    2016_9 2016-03-06    4
6    6    2016_9 2016-03-06    4
4    4   2016_10 2016-03-13    5
7    7   2016_10 2016-03-13    5
       ## 1st method :-- This apply for large dataset

 ## Split the "year_week" column into 2 columns

             df[['year', 'week']] =df['year_week'].str.split("_",expand=True)

     ## Change the datatype of newly created columns
             df['year'] = df['year'].astype('int')

             df['week'] = df['week'].astype('int')

    ## Sort the dataframe by newly created column

             df= df.sort_values(['year','week'],ascending=True)

   ## Drop years & months column

             df.drop(['year','week'],axis=1,inplace=True)

   ## Sorted dataframe
            df


   ## 2nd method:-- 
        
     ## This apply for small dataset

           ## Change the datatype of column

                df['year_week'] = df['year_week'].astype('str')

          ## Categories the string, the way you want

               cats = ['2015_1', '2015_10','2015_11','2016_3','2016_9', '2016_10']

         # Use pd.categorical() to categories it 

 df['year_week']=pd.Categorical(df['year_week'],categories=cats,ordered=True)

          ## Sort the 'year_week' column

              df= df.sort_values('year_week')

           ## Sorted dataframe
              df

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM