[英]groupby on column which contain bytearray object using Pandas Dataframe
我有 pandas dataframe 並想對客戶 ID 進行 groupby
df['rank_col'] = df.groupby('PSEUDO_CUSTOMER_ID')['DB_CREATED_DT'].rank(method='first')
現在問題是 pseudo_customer_ID 看起來像這樣
[138, 76, 16, 9, 86, 71, 5, 85, 117, 237, 97, 212, 13, 157, 185, 150, 207, 97, 85, 165]
下面是我對偽客戶 ID 進行價值計數時的快照,
我檢查我得到的單個值低於值
注意:我想對 pseudo_customer_ID 進行 groupby 並按 DB_CREATED_DT 列進行排名
使用bytes
function 轉換您的bytearray
以允許分組(並獲取可散列類型):
演示:
df['PSEUDO_CUSTOMER_ID_BYTES'] = df['PSEUDO_CUSTOMER_ID'].apply(bytes)
print(df)
# Output:
PSEUDO_CUSTOMER_ID PSEUDO_CUSTOMER_ID_BYTES
0 [138, 76, 16, 9, 86, 71, 5, 85, 117, 237, 97, ... b'\x8aL\x10\tVG\x05Uu\xeda\xd4\r\x9d\xb9\x96\x...
按PSEUDO_CUSTOMER_ID
:
>>> list(df.groupby('PSEUDO_CUSTOMER_ID'))
...
TypeError: unhashable type: 'bytearray'
按PSEUDO_CUSTOMER_ID_BYTES
:
>>> list(df.groupby('PSEUDO_CUSTOMER_ID_BYTES'))
[(b'\x8aL\x10\tVG\x05Uu\xeda\xd4\r\x9d\xb9\x96\xcfaU\xa5',
PSEUDO_CUSTOMER_ID PSEUDO_CUSTOMER_ID_BYTES
0 [138, 76, 16, 9, 86, 71, 5, 85, 117, 237, 97, ... b'\x8aL\x10\tVG\x05Uu\xeda\xd4\r\x9d\xb9\x96\x...)]
重要的
如果您確定原始編碼,則可以使用str.decode
來獲取str
而不是bytes
字符串。 這里似乎是latin-1
:
df['PSEUDO_CUSTOMER_ID_STR'] = df['PSEUDO_CUSTOMER_ID'].decode('latin1'))
print(df.loc[0])
# Output:
PSEUDO_CUSTOMER_ID [138, 76, 16, 9, 86, 71, 5, 85, 117, 237, 97, ...
PSEUDO_CUSTOMER_ID_BYTES b'\x8aL\x10\tVG\x05Uu\xeda\xd4\r\x9d\xb9\x96\x...
PSEUDO_CUSTOMER_ID_STR L\tVGUuíaÔ\rÏaU¥
Name: 0, dtype: object
演示:
>>> list(df.groupby('PSEUDO_CUSTOMER_ID_STR'))
[('\x8aL\x10\tVG\x05UuíaÔ\r\x9d¹\x96ÏaU¥',
PSEUDO_CUSTOMER_ID PSEUDO_CUSTOMER_ID_BYTES PSEUDO_CUSTOMER_ID_STR
0 [138, 76, 16, 9, 86, 71, 5, 85, 117, 237, 97, ... b'\x8aL\x10\tVG\x05Uu\xeda\xd4\r\x9d\xb9\x96\x... L\tVGUuíaÔ\rÏaU¥)]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.