简体   繁体   English

将列数据框熊猫转换为序列

[英]Transforming column dataframe pandas into sequence

I have data and convert into dataframe 我有数据并转换为数据框

d = [
  (1,70399,0.988375133622),
  (1,33919,0.981573492596),
  (1,62461,0.981426807114),
  (579,1,0.983018778374),
  (745,1,0.995580488899),
  (834,1,0.980942505189)
]

df = pd.DataFrame(d, columns=['source', 'target', 'weight'])

>>> df
   source  target    weight
0       1   70399  0.988375
1       1   33919  0.981573
2       1   62461  0.981427
3     579       1  0.983019
4     745       1  0.995580
5     834       1  0.980943

I need transform column source into sequence, I have tried using 我需要将列源转换为序列,我尝试使用

df.source = (df.source.diff() != 0).cumsum() - 1

but I just get : 但我得到:

>>> df
   source  target    weight
0       0   70399  0.988375
1       0   33919  0.981573
2       0   62461  0.981427
3       1       1  0.983019
4       2       1  0.995580
5       3       1  0.980943

I need transform value column target based value source, ideal result is : 我需要基于值的转换值列目标,理想的结果是:

>>> df
   source  target    weight
0       0   70399  0.988375
1       0   33919  0.981573
2       0   62461  0.981427
3       1       0  0.983019
4       2       0  0.995580
5       3       0  0.980943

value target change match value in source, in source , value 1 change into 0, so i need change value 1 in target into 0 too target将source中的匹配值更改为source,将source中的value 1更改为0,所以我也需要将target value 1更改为0

How can I do that ? 我怎样才能做到这一点 ? Maybe anyone can help me :) 也许任何人都可以帮助我:)

Thanks :) 谢谢 :)

Something like this? 像这样吗

df['source_code'] = df.source.astype('category').cat.codes

>>> df
   source  target    weight  source_code
0       1   70399  0.988375            0
1       1   33919  0.981573            0
2       1   62461  0.981427            0
3     579       1  0.983019            1
4     745       1  0.995580            2
5     834       1  0.980943            3

You can use: 您可以使用:

#remember original values
source_old = df.source.copy()

df.source = (df.source.diff() != 0).cumsum() - 1

#series for maping
ser = pd.Series(df.source.values, index=source_old).drop_duplicates()
print (ser)
source
1      0
579    1
745    2
834    3
dtype: int32

#map where values exists
df.target = df.target.mask(df.target.isin(ser), df.target.map(ser)).astype(int)

print (df)
   source  target    weight
0       0   70399  0.988375
1       0   33919  0.981573
2       0   62461  0.981427
3       1       0  0.983019
4       2       0  0.995580
5       3       0  0.980943

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM