简体   繁体   English

如何使用 displot 在 python 中制作 seaborn plot ,其中我们计算一个字段中的唯一值而不是总行数?

[英]How can I make a seaborn plot in python with displot where we count unique values in one field rather than the total number of rows?

I have a dataframe that contains about 60,000 rows.我有一个包含大约 60,000 行的 dataframe。 All 60,000 of them have unique record identifiers, but they also have separate sessionIDs, of which about 12,000 are unique.所有 60,000 个都有唯一的记录标识符,但它们也有单独的 sessionID,其中大约 12,000 个是唯一的。

I am trying to use seaborn distplot in order to make figures using these values, but when distplot does the aggregation, I can only get it to count the number of records and I cannot get it to aggregate over the number of unique sessionIDs.我正在尝试使用 seaborn distplot 来制作使用这些值的数字,但是当 distplot 进行聚合时,我只能让它计算记录数,而我不能让它聚合唯一 sessionID 的数量。

Here is an example dataframe.这是一个示例 dataframe。

temp_df = pd.DataFrame([['d7d1b050-0e48-4c00-8061-c78817155b72',
  '42773088-e38f-4578-bc2a-69d1797a90eb',
  11,
  'groupA'],
 ['962c397d-a8f8-4f1c-a589-ecf74a7da62d',
  'b5baafb0-f6d4-4b4e-bc76-1287614b985d',
  10,
  'groupA'],
 ['a90fde40-9b9f-466e-bd5e-a40325b5fc9d',
  'b3fba007-aef5-4a5f-a53b-94eb0705d953',
  11,
  'groupB'],
 ['22ebb056-603c-4f66-8240-8c54e8043509',
  'b780fa66-addd-48c0-8db4-d755ebd351b8',
  10,
  'groupC'],
 ['52ffd64c-a5c1-4cd5-89c8-c1dcb8bd24b2',
  '37482cb7-c354-4b4b-92b6-2aaa62811e5b',
  10,
  'groupA'],
 ['55524169-f159-4c31-b939-bb00e1cba804',
  '34a9ff63-ea75-473d-ab89-9a92c3f4a8d9',
  10,
  'groupB'],
 ['2027d9d0-1e29-4d1f-969a-995a47f12052',
  '875488ea-85a2-47cb-b1ea-62003bbce80a',
  10,
  'groupA'],
 ['10d9c9fb-b5dd-4581-b148-a6198abecec1',
  '3f4b0604-513a-424b-98a3-e788ab3daa97',
  11,
  'groupD'],
 ['1c1e183b-6459-41bd-99aa-5f89b375006a',
  '53dd2ffd-c9b0-49c3-9275-190716c78799',
  10,
  'groupB'],
 ['31030ded-64a7-4854-8042-585605141e71',
  'f0514527-2d7b-4cad-a36f-f21e3425093c',
  10,
  'groupD'],
 ['cdfd5a0c-dd8c-4546-ba31-c2f021fb4859',
  '1ed007fe-d4f7-41bc-8f3c-b163c57f8a1f',
  11,
  'groupE'],
 ['66bd16a5-b514-4d8a-ad7a-afb8921f7dd2',
  'a2e9f137-bba5-46ec-8b13-7b17821de735',
  10,
  'groupB'],
 ['3cdb21d9-be3c-4723-bf28-0a7769d492b4',
  '9a6f1516-54a0-4dda-83d7-e05311e87ff5',
  10,
  'groupE'],
 ['d25f4cb2-3bf7-4898-a8a3-91d9e1b58576',
  '716a7732-6bcd-478d-87f9-c13cd83eaf66',
  11,
  'groupA'],
 ['e95134fd-7ce2-4e88-808c-e5abf13a4892',
  'c021c21b-7bab-4e1f-9ff0-4dfc584263b8',
  11,
  'groupE'],
 ['e13da005-1033-466f-b984-48fdfa0988f2',
  '5bcc0651-0775-4fa5-b521-ac90e0a33b1c',
  10,
  'groupB'],
 ['b60ee53d-e4fc-4e37-aa1c-df67f66e304e',
  '592adca4-6fa6-48c3-be97-2357250d736d',
  10,
  'groupD'],
 ['c1d47246-838f-418a-a92d-7b5150122775',
  'ff5d180c-cca9-474a-974e-e18c35cab912',
  10,
  'groupA'],
 ['fc129686-f7cd-407a-aca3-68f86c52af41',
  'a18dfc3a-2ce6-43f7-a21f-4c7371cff2b6',
  11,
  'groupE'],
 ['191af645-cb9e-408a-af2e-b6826f7177b9',
  'd430610b-b7da-42cb-aa93-c7f94774093c',
  10,
  'groupA']])

temp_df.columns = ['clickId', 'sessionId', 'month','group']

sns.displot(data=temp_df, x='month', hue='group')

Conceptually, I guess what I want to do is take the dataframe and eliminate all duplicate rows at the sessionId level, but I don't know how to do that.从概念上讲,我想我想做的是采用 dataframe 并消除 sessionId 级别的所有重复行,但我不知道该怎么做。

Can someone help me?有人能帮我吗?

Thanks, Brad谢谢,布拉德

The answer is surprisingly simple.答案出奇的简单。

When I was trying to draw the original plot, I was doing当我试图绘制原始 plot 时,我在做

sns.displot(temp_df, x='month', hue='group') which then included all of the data, so it was using unique row identifiers, but since I wanted to go with just sessionId, the solution I found was sns.displot(temp_df, x='month', hue='group') 然后包含所有数据,因此它使用唯一的行标识符,但由于我想只使用 sessionId 来 go,我找到的解决方案是

sns.displot(temp_df[['sessionId', 'month','group']].drop_duplicates(), x='month', hue='group') sns.displot(temp_df[['sessionId', 'month','group']].drop_duplicates(), x='month', hue='group')

and that works.那行得通。

Hopefully this helps someone else.希望这对其他人有帮助。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使熊猫 groupby().count() 求和而不是行? - How to make pandas groupby().count() sum values rather than rows? 如何 plot 百分比与 seaborn distplot / histplot / displot - How to plot percentage with seaborn distplot / histplot / displot 具有常量值数组的Python Seaborn调度 - Python Seaborn displot with array of constant values 如何获得在一列中具有多个相同值的下两行行的总值计数? - How can I get the total value count of the next two rows of rows that have more than one same value in a column? 我在 Power BI 中使用 Python 脚本。 如何格式化多个 seaborn 'displot' 的 x 轴刻度标签和标题 - I am using Python script in Power BI. How can I format the x axis tick labels and titles for a multiple seaborn 'displot' 在 python 中,我如何计算列中的唯一值以逐渐增加组内的行数 - How, in python, can I count unique values in a column for gradually increasing numbers of rows within groups 我们怎样才能让 pandas 默认处理缺失值警告它们的存在而不是默默地忽略它们? - How can we make pandas default handling of missing values warn of their presence rather than silently ignore them? 如何使用 Python 计算 csv 中唯一值的数量 - How do I count the number of unique values in a csv using Python 如何在我的 Seaborn 显示 y 轴上显示百分比而不是计数? - How to show percentage instead of count on my Seaborn displot y axis? 我怎样才能 plot 一个简单的 plot 与 seaborn 从 Z23EEEB4347BDD7526BFC6B7EE34Z 字典? - How can I plot a simple plot with seaborn from a python dictionary?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM