简体   繁体   English

如何从 Python dataframe 中提取和分离随机元组值?

[英]How to extract and separate random tuple values from a Python dataframe?

Two values, say subject and subject category exist as columns in a data frame.两个值,比如主题和主题类别作为数据框中的列存在。 Along with this I have the weight of subject in another column.除此之外,我在另一列中有主题的权重。 I wish to create another data-frame that has random instances of a subject and it's corresponding subject category based on the weights of the subject.我希望创建另一个具有主题随机实例的数据框,并且它是基于主题权重的相应主题类别。 The tricky part here for me is to use both values ( subject, and subject category) together with the weights.对我来说,棘手的部分是将两个值(主题和主题类别)与权重一起使用。 While I am able to extract the value of the tuple and generate a random instance of the tuple based on the weight.虽然我能够提取元组的值并根据权重生成元组的随机实例。 I am not able to separate the tuple into its constituent elements to insert into the final data frame.我无法将元组分成其组成元素以插入最终数据帧。 The 'Zip' function is not working. “邮编”function 不工作。

In my output dataframe I would want a Serial Number, a Subject and a Subject Category as separate columns.在我的 output dataframe 中,我想要一个序列号、一个主题和一个主题类别作为单独的列。 I would appreciate some help in making this work and also if you had some ideas on how this kind of a problem can be better approached.如果您对如何更好地解决此类问题有一些想法,我将不胜感激。

import random
import pandas as pd

data=[['Agricultural services', 'Agricultural services, inputs, tools and equipment',   1],
['Agriculture primary production(livestock)',   'Agricultural services, inputs, tools and equipment',7],
['Assist Uganda in upgrading its coffee and cocoa value chains',    'Agricultural services, inputs, tools and equipment',1],
['Building materials and agricultural tools Building and civil works', 'carpentry, construction materials, maintenance, renovation, road works',1],
['Clearing and forwarding services','Clearing and forwarding services', 1],
['Collection of revenue from big slaughters Collection of fees', 'taxes and revenue', 1],
['Collection of revenue from chicken sellers','Collection of fees, taxes and revenue',  19]]
tender_subject = pd.DataFrame(data, columns = ['sub', 'sub_category','subject_dist']) 

subject_tuple=list(tender_subject[['sub', 'sub_category']].itertuples(index=False, name=None)) #we could have also used tuple here instead of 'list'
subject_weights=tender_subject['subject_dist'].tolist()
data={'SL':[], 'Subject':[],'subject_category':[],}
output_df=pd.DataFrame(data)
x=0
for i in range(10):
    p=random.choices(subject_tuple,subject_weights)  
    p1,p2=zip(p) # This line is not working

    output_df.loc[x]=[i]+[p1]+[p2]
    x=x+1
print(output_df.head)

Pandas has rich API, with many methods available for common data processing tasks. Pandas拥有丰富的API,有多种方法可用于常见的数据处理任务。 You should use these methods where available, because they're thoughtfully designed, robust, and well tested.您应该在可用的情况下使用这些方法,因为它们经过精心设计、稳健且经过良好测试。

Here, you should use the sample method :在这里,您应该使用示例方法

tender_subject.sample(10, replace=True, weights=tender_subject.subject_dist)

                                                 sub                                       sub_category  subject_dist
6         Collection of revenue from chicken sellers              Collection of fees, taxes and revenue            19
6         Collection of revenue from chicken sellers              Collection of fees, taxes and revenue            19
6         Collection of revenue from chicken sellers              Collection of fees, taxes and revenue            19
6         Collection of revenue from chicken sellers              Collection of fees, taxes and revenue            19
5  Collection of revenue from big slaughters Coll...                                  taxes and revenue             1
...

You can see that the sampling has resulted in repeated index values.您可以看到采样导致重复的索引值。 Reset the index and select only the columns you want:仅重置索引和 select 所需的列:

(tender_subject
 .sample(10, replace=True, weights=tender_subject.subject_dist)
 .reset_index()[['sub', 'sub_category']])

                                          sub                                       sub_category
0  Collection of revenue from chicken sellers              Collection of fees, taxes and revenue
1  Collection of revenue from chicken sellers              Collection of fees, taxes and revenue
2  Collection of revenue from chicken sellers              Collection of fees, taxes and revenue
3  Collection of revenue from chicken sellers              Collection of fees, taxes and revenue
4  Collection of revenue from big slaughters               Coll...                                  taxes and revenue
...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM