简体   繁体   English

如何从 python 中的列表中提取?

[英]How do I extract from a list in python?

If I have a list that is made up of 1MM ids, how would I pull from that list in intervals of 50k?如果我有一个由 1MM id 组成的列表,我将如何以 50k 的间隔从该列表中提取?

For example:例如:

[1]cusid=df['customer_id'].unique().tolist()
[1]1,000,500

If I want to pull in chunks, is the below correct for 50k?如果我想拉大块,下面的 50k 是否正确?

cusid=cusid[:50000] - first 50k ids
cusid=cusid[50000:100001] - the next 50k of ids
cusid=cusid[100001:150001] - the next 50k 

are my interval selections correct?我的间隔选择正确吗?

Thanks!谢谢!

cusid2 = [cusid[a:a+50000] for a in range(0, 950000, 50000)]

This is a list comprehension basically you will add to your list every element cusid[a: a+50000] for a going from 0 to 950000 (so 1m minus 50k) and iterate with a step of 50k so a will go up by 50k every iteration这是一个列表理解,基本上你会将每个元素 cusid[a: a+50000] 添加到列表中,从 0 到 950000(所以 1m 减去 50k)并以 50k 的步长进行迭代,因此 go 每增加 50k迭代

Couple of things to mention:有几点要提:

  1. It seems that you're using "data science" stack for your work, good chance you have numpy available, please take a look at numpy.array_split .您似乎正在使用“数据科学”堆栈进行工作,很有可能您有numpy可用,请查看numpy.array_split You can calculate chunk amount once and use np view machinery.您可以计算一次块量并使用 np 视图机制。 Most probably this is a lot faster than bringing np arrays in to native python lists很可能这比将 np arrays 带入本机 python 列表要快得多

  2. Idiomatic python approach (IMO) would be leveraging iterators + islice :惯用的 python 方法(IMO)将利用迭代器 + islice

     from itertools import islice # create iterator from your array/list, this is cheap operation iterator = iter(cusid) # if you want element-wise operations, you can use your chunk in loops or function that require iterations # this is really memory-efficient, as you don't put whole chunk in memory chunk = islice(iterator, 50000) s = sum(chunk) # in case you really need whole chunk in memory, just turn isclice into list chunk = list(islice(iterator, 50000)) last_in_chunk = chunk[-1] # and you always use same code to consume next chunk from your source # without maintaining any counters next_chunk = list(islice(iterator, 50000))

When your iterator is exhausted (there's no values left) you will get empty chunk(s).当您的iterator用尽时(没有剩余值),您将得到空块。 When there's not enough elements to create full chunk, you will get as much as is left there.当没有足够的元素来创建完整的块时,你会得到尽可能多的东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 PYTHON I 中以“A”开头的列表中提取单词? - How do I extract words from a list that start with “A” in PYTHON I? 如何从 Python 的链表中的节点中提取数据? - How do I extract data from a node in a linked list in Python? 如何从python列表中提取特定电子邮件? - How do I extract specific emails from list in python? 如何从 Python 中的嵌套 json 中提取列表项? - How do I extract a list item from nested json in Python? 如何从python字典中提取列表中的项目 - How do I extract items which are in a list from python dictionary Python - 如何从列表中提取元组? - Python - how do I extract a tuple from a list? 在python中,如何通过匹配原始列表中的字符串模式从字符串列表中提取子列表 - In python, how do i extract a sublist from a list of strings by matching a string pattern in the original list 如何从python中的json文件中提取名称列表? - How do I extract a list a list of name from a json file in python? 如何在Python中从列表中提取最后两个项目(字符串或元组)? - How do I extract the last two items from the list, strings or tuples in Python? 如何从 sitemap.xml 文件创建列表以在 python 中提取 url? - how do I create a list from a sitemap.xml file to extract the url in python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM