简体   繁体   English

python:提取不同列表的项目并将它们放在一组中

[英]python: extract items of different lists and put them in one set

I have a file like this: 我有一个像这样的文件:

93.93.203.11|["['vmit.it', 'umbertominnella.it', 'studioguizzardi.it', 'telestreet.it', 'maurominnella.com']"]
168.144.9.16|["['iipmalumni.com','webdesignhostingindia.com', 'iipmstudents.in', 'iipmclubs.in']"]
195.211.72.88|["['tcmpraktijk-jingshen.nl', 'ellen-siemer.nl'']"]
129.35.210.118|["['israelinnovation.co.il', 'watec-peru.com', 'bsacimeeting.org', 'wsava2015.com', 'picsmeeting.com']"]

I want to extract domains in all the lists and add them to one set. 我想提取所有列表中的域并将它们添加到一组中。 ultimately, i would like to have a fine with each unique domain in one line. 最终,我希望每一行都包含一个唯一域。 Here is the code I have written: 这是我编写的代码:

set_d = set()
f = open(file,'r')
for line in f:
    line = line.strip('\n')
    ip,list = line.split('|')
    l = json.loads(list)
    for e in l:
        domain = e.split(',')
        set_d.add(domain)
        print set_d

but it gives the below error: 但它给出以下错误:

    set_d.add(domain)
TypeError: unhashable type: 'list'

Can anybody help me out? 有人可以帮我吗?

You should call update instead of add ; 您应该调用update而不是add

set_d.update(domain)

Example; 例;

>>> set_d = {'a', 'b', 'c'}
>>> set_d.update(['c', 'd', 'e'])
>>> print set_d
{'a', 'b', 'c', 'd', 'e'}

Use str.translate to clean the text and add to the set using update: 使用str.translate清理文本并使用update添加到集合中:

set_d = set()
with open(file,'r') as f:
    for line in f:
       lst = (x.strip() for x in line.split("|")[1].translate(None,"\"'[]").split(","
        set_d.update(lst)

outputs a unique set of individual domains: 输出一组独特的单个域:

set(['vmit.it', 'tcmpraktijk-jingshen.nl', 'umbertominnella.it', 'studioguizzardi.it', 'telestreet.it', 'watec-peru.com', 'bsacimeeting.org', 'webdesignhostingindia.com', 'wsava2015.com', 'iipmstudents.in', 'maurominnella.com', 'ellen-siemer.nl', 'picsmeeting.com', 'iipmalumni.com', 'iipmclubs.in', 'israelinnovation.co.il'])

which you can write to a new file: 您可以将其写入新文件:

set_d = set()
with open(file,'r') as f,open("out.txt","w") as out:
    for line in f:
        lst = (x.strip() for x in line.split("|")[1].translate(None,"\"'[]").split(","))
        set_d.update(lst)
    for line in set_d:
        out.write("{}\n".format(line))

The output: 输出:

$ cat out.txt 
vmit.it
tcmpraktijk-jingshen.nl
umbertominnella.it
studioguizzardi.it
telestreet.it
watec-peru.com
bsacimeeting.org
webdesignhostingindia.com
wsava2015.com
iipmstudents.in
maurominnella.com
ellen-siemer.nl
picsmeeting.com
iipmalumni.com
iipmclubs.in
israelinnovation.co.il

Your code will not separate into individual domains, your json call does not really do anything to help. 您的代码不会分成单独的域,您的json调用实际上并没有任何帮助。 Changing your code to update will output something like the following: 更改代码以更新将输出如下内容:

{" 'maurominnella.com']", " 'wsava2015.com'", "'webdesignhostingindia.com'", " 'iipmclubs.in']", " 'ellen-siemer.nl'']", " 'umbertominnella.it'", " 'picsmeeting.com']", "['israelinnovation.co.il'", "['vmit.it'", " 'iipmstudents.in'", "['tcmpraktijk-jingshen.nl'", " 'studioguizzardi.it'", "['iipmalumni.com'", " 'watec-peru.com'", " 'bsacimeeting.org'", " 'telestreet.it'"}

Also don't use list as a variable name either it shadows the python list 也不要使用list作为变量名,否则它会遮盖python list

As the result of split function is a list ( domain = e.split(',') )and lists are unhashable you cant add them to set . 由于split函数的结果是一个列表( domain = e.split(',') ),并且列表不可散列,因此无法将其添加到set instead you can add those elements to your set with set.update() , But you dont need Json as it doesn't separate your domain and doesn't give you the desire result instead you can use ast.literal_eval to split your list : 相反,您可以使用set.update()将这些元素添加到集合中,但是您不需要Json因为它不会分隔您的域,也不会给您带来期望的结果,而是可以使用ast.literal_eval来拆分列表:

import ast
set_d = set()
f = open(file,'r')
for line in f:
    line = line.strip('\n')
    ip,li = line.split('|')
    l = ast.literal_eval(ast.literal_eval(li)[0])
    for e in l:
        domain = e.split(',')
        set_d.update(domain)
    print set_d

Note that dont use of python built-in functions or types as your variable! 请注意,请勿将python内置函数或类型用作变量!

And as a more efficient way you just can use regex to grub your domains : 作为一种更有效的方法,您可以使用正则表达式来搜索您的域:

f = open(file,'r').read()
import re
print set(re.findall(r'[a-zA-Z\-]+\.[a-zA-Z]+',f))

result: 结果:

set(['vmit.it', 'tcmpraktijk-jingshen.nl', 'umbertominnella.it', 'studioguizzardi.it', 'telestreet.it', 'israelinnovation.co', 'bsacimeeting.org', 'webdesignhostingindia.com', 'iipmstudents.in', 'maurominnella.com', 'ellen-siemer.nl', 'picsmeeting.com', 'watec-peru.com', 'iipmalumni.com', 'iipmclubs.in'])
[Finished in 0.0s]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 拆分单元格值并将它们放入不同的列表中 - Split cells values and put them into different lists 从目录中提取所有音频文件,然后将它们放到新文件中。 蟒蛇 - Extract all audio files from a directory and put them to a new one | python 如何在列表中选择不同的项目并为每个输出放置一个数字? - How can select different items in lists and put a number for every output? 如何在 python 的不同列表中排列项目? - How to arrange items in different lists in python? 如何将两个不同的 python 列表中的两个项目列为新列表中的一个项目? - How to list two items from two different python lists as one item on a new list? 从列表列表中提取单词并将它们存储在python中的单独变量中 - Extract words from list of lists and store them in a separate variable in python 从列表中提取值并将它们放入 Python 中的 dataframe - Extract values from a list and put them into a dataframe in Python 列表中的项目也是列表,我想让它们成为字符串 - Python - Items in list are also lists and I want to make them strings - Python Python:将每个元素的不同列表列表连接到一个列表列表中 - Python: Join each element different lists of lists in one list of lists 如何从文件中提取图像并将其放入列表中,使用python - how to extract images from file and put them in a list using python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM