简体   繁体   English

从一个文件写入另一个python

[英]Writing from one file to another python

I am trying to take some information I got from a webpage and write one of the variables to a file however I am having no luck it is probably very easy but I'm lost. 我试图获取从网页中获取的一些信息,并将其中一个变量写入文件,但是运气不好,这可能很容易,但我迷路了。 Here is an example of one of the rows there are 1253 rows. 这是其中一个行的示例,其中有1253行。

<div class='entry qual-5 used-demoman slot-head bestprice custom' data-price='3280000' data-name="Kill-a-Watt Allbrero" data-quality="5" data-australium="normal" data-class="demoman" data-particle_effect="56" data-paint="" data-slot="cosmetic" data-consignment="consignment">

I am after the field called data-name it is not at the same spot in each row. 我在名为data-name的字段之后,它不在每一行的同一位置。 I tried this but it did not work 我尝试了这个,但是没有用

mfile=open('itemlist.txt','r')
mfile2=open('output.txt','a')
for row in mfile:
    if char =='data-name':
        mfile2.write(char)

Edit 1: 编辑1:

I made an example file of 'hello hi peanut' if did: 如果这样做,我制作了一个“ hello hi花生”的示例文件:

for row in mfile:
    print row.index('hello')

it would print 0 as expected however when I changed the hello to hi it didnt return 1 it returned nothing. 它会按预期打印0,但是当我将hello更改为hi时,它没有返回1,则什么也没有返回。

Let's try to find the value using common string manipulation methods: 让我们尝试使用常见的字符串操作方法查找值:

>>> line = '''<div class='entry qual-5 used-demoman slot-head bestprice custom' data-price='3280000' data-name="Kill-a-Watt Allbrero" data-quality="5" data-australium="normal" data-class="demoman" data-particle_effect="56" data-paint="" data-slot="cosmetic" data-consignment="consignment">'''

We can use str.index to find the position of a string within a string: 我们可以使用str.index查找字符串在字符串中的位置:

>>> line.index('data-name')
87

So now we know we need to start looking at index 87 for the attribute we are interested in: 因此,现在我们知道我们需要开始查看索引87 ,查找我们感兴趣的属性:

>>> line[87:]
'data-name="Kill-a-Watt Allbrero" data-quality="5" data-australium="normal" data-class="demoman" data-particle_effect="56" data-paint="" data-slot="cosmetic" data-consignment="consignment">'

Now, we need to remove the data-name=" part too: 现在,我们也需要删除data-name="部分:

>>> start = line.index('data-name') + len('data-name="')
>>> start
98
>>> line[start:]
'Kill-a-Watt Allbrero" data-quality="5" data-australium="normal" data-class="demoman" data-particle_effect="56" data-paint="" data-slot="cosmetic" data-consignment="consignment">'

Now, we just need to find the index of the closing quotation mark too, and then we can extract just the attribute value: 现在,我们也只需要找到右引号的索引,然后就可以只提取属性值:

>>> end = line.index('"', start)
>>> end
118
>>> line[start:end]
'Kill-a-Watt Allbrero'

And then we have our solution: 然后我们有解决方案:

start = line.index('data-name') + len('data-name="')
end = line.index('"', start)
print(line[start:end])

We can put that in the loop: 我们可以将其放入循环中:

with open('itemlist.txt','r') as mfile, open('output.txt','a') as mfile2w
    for line in mfile:
        start = line.index('data-name') + len('data-name="')
        end = line.index('"', start)
        mfile2.write(line[start:end])
        mfile2.write('\n')

You can also use beautifulsoup : 您也可以使用beautifulsoup

a.html : a.html

<html>
    <head>
        <title> Asdf </title>
    </head>
    <body>

        <div class='entry qual-5 used-demoman slot-head bestprice custom' data-price='3280000' data-name="Kill-a-Watt Allbrero" data-quality="5" data-australium="normal" data-class="demoman" data-particle_effect="56" data-paint="" data-slot="cosmetic" data-consignment="consignment">

    </body>
</html>

a.py : a.py

from bs4 import BeautifulSoup
with open('a.html') as f:
    lines = f.readlines()
soup = BeautifulSoup(''.join(lines), 'html.parser')
result = soup.findAll('div')[0]['data-price']
print result
# prints 3280000

My opinion is, if your task is pretty easy as in your example, there is actually no need of using beautifulsoup . 我的观点是,如果您的任务很简单,例如您的示例,那么实际上就不需要使用beautifulsoup However, if it is more complicated, or it will be more complicated. 但是,如果更复杂,否则会更复杂。 Consider giving it a try with beautifulsoup . 考虑尝试使用beautifulsoup

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM