简体   繁体   English

从熊猫列获取唯一月份列表

[英]get list of unique months from pandas column

Let's say I have the following pandas date_range : 假设我有以下熊猫date_range

rng = pd.date_range('9/1/2017', '12/31/2017')

I want to get a list of the unique months. 我想获得一份独特月份的清单。 This is what I've come up with so far but there has to be a better way: 到目前为止,这是我想出的,但必须有一种更好的方法:

df = pd.DataFrame({'date': rng})
months = df.groupby(pd.Grouper(key='date', freq='M')).agg('sum').index.tolist()
formatted_m = [i.strftime('%m/%Y') for i in months]
# ['09/2017', '10/2017', '11/2017', '12/2017']

Note the dates will be stored in a DataFrame column or index. 请注意,日期将存储在DataFrame列或索引中。

Use numpy.unique because DatetmeIndex.strftime return numpy array : 使用numpy.unique是因为DatetmeIndex.strftime返回numpy array

rng = pd.date_range('9/1/2017', '12/31/2017')
print (np.unique(rng.strftime('%m/%Y')).tolist())
['09/2017', '10/2017', '11/2017', '12/2017']

If input is column of DataFrame use Anton vBR's solution : 如果输入是DataFrame的列, DataFrame使用Anton vBR的解决方案

print(df['date'].dt.strftime("%m/%y").unique().tolist())

Or drop_duplicates : drop_duplicates

print(df['date'].dt.strftime("%m/%y").drop_duplicates().tolist())

Timings : 时间

All solution have same performance - unique vs drop_duplicates: 所有解决方案都具有相同的性能-唯一vs. drop_duplicates:

rng = pd.date_range('9/1/1900', '12/31/2017')

df = pd.DataFrame({'date': rng})

In [54]: %timeit (df['date'].dt.strftime("%m/%y").unique().tolist())
1 loop, best of 3: 469 ms per loop

In [56]: %timeit (df['date'].dt.strftime("%m/%y").drop_duplicates().tolist())
1 loop, best of 3: 466 ms per loop

Yes or this: 是或这个:

df['date'].dt.strftime("%m/%y").unique().tolist()
#['09/17', '10/17', '11/17', '12/17']

Do not need to build the df 不需要建立df

(rng.year*100+rng.month).value_counts().index.tolist()
Out[861]: [201712, 201710, 201711, 201709]

Updated : 更新 :

set((rng.year*100+rng.month).tolist())
Out[865]: {201709, 201710, 201711, 201712}

I usually use this one and I think it's quite straightforward: 我通常使用这个,我认为它很简单:

rng.month.unique()

Edit: Probably not relevant any longer, but just for the sake of completeness: 编辑:可能不再相关,但仅出于完整性考虑:

set([str(year)+str(month) for year , month in zip(rng.year,rng.month)])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从包含整数列表的 pandas 列中获取唯一组合 - Get unique combinations from pandas column containing list of integers pandas 从列表列中获取唯一值 - pandas get unique values from column of lists 如何从 pandas dataframe 中的月份列表中获取所有季节? - How to get all the seasons from a list of months in pandas dataframe? 如何从 pandas dataframe pandas 中的事务表中获取唯一 ID 的月数 - how to get count of months for unique ID from transaction table in pandas dataframe pandas 从列表中获取唯一值作为Pandas python中的值 - Get unique from list as values in Pandas python 尝试从 Python 中的 Pandas 数据框列获取唯一值时,如何克服不可散列类型:“列表”错误 - How to overcome unhashable type: 'list' error, when trying to get unique values from a pandas dataframe column in Python 如何通过pandas中的user_id按列从列中获取唯一值 - how to get unique values from list column by group by user_id in pandas 如何获取唯一 pandas dataframe 列元素的列表? - how to get a list for unique pandas dataframe column elements? 熊猫:如何获取包含值列表的列的唯一值? - Pandas: how to get the unique values of a column that contains a list of values? 如何获取包含列表或值的列熊猫的唯一值? - how to get the unique value of a column pandas that contains list or value?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM