[英]How do I print rows of a csv file that have a specific keyword in them
I'm trying to open two csv files, one with data (minidata.csv) and one with keywords (minikeys.csv), and search through the first one for keywords from the second one, and then print out the lines from the first one that include the keywords from the second one. 我正在尝试打开两个csv文件,一个包含数据(minidata.csv),另一个包含关键字(minikeys.csv),并在第一个文件中搜索第二个文件中的关键字,然后打印出第一个一个包含第二个关键字的关键字。 Hope that makes sense.
希望有道理。
I've tried opening the keywords file (minikeys.csv) as a list and searching from there, but I've come the closest to success by opening it into a dictionary for some reason. 我已经尝试打开关键字文件(minikeys.csv)作为列表并从那里搜索,但是由于某种原因我把它打开到字典中是最接近成功的。
with open('minidata.csv', 'r') as f:
text = f.read()
csvFileArray = []
with open('minikeys.csv', 'r') as inf:
reader = csv.reader(inf)
mydict = {rows[0] for rows in reader}
for key in mydict:
for row in text:
if key in text:
print(row)
This will get it to print out every line in the minidata.csv file, not the matching ones, but it also prints out each character as many times as there is a character in the minikeys. 这将使它打印出minidata.csv文件中的每一行,而不是匹配的那一行,但它也打印出每个字符的次数,因为minikeys中有一个字符。 So it'll give me output like:
所以它会给我输出像:
aaaa,,,,bbbb,,,,cccc,,,,dddd... AAAA ,,,, BBBB CCCC ,,,, ,,,, DDDD ...
instead of printing out the lines that match. 而不是打印匹配的行。
What should I do instead to get this to work? 我该怎么办才能让它发挥作用?
Instead of 代替
text = f.read()
do 做
text = f.readlines()
The issue here is that you're reading it as one big long string with the newlines included - whereas you want to be reading as a list of lines. 这里的问题是你将它读作一个包含换行符的大字符串 - 而你想要作为一个行列表阅读。 In essence,
f.readlines()
is roughly equivalent to f.read().split('\\n')
(not entirely, but similar enough for this particular comparison). 本质上,
f.readlines()
大致相当于f.read().split('\\n')
(不完全,但对于这个特定的比较来说足够相似)。 Hence, why you see the output you do - you're iterating per character , not per line . 因此,为什么你看到你做的输出 - 你是每个字符迭代,而不是每行 。
Changing text
so that it ends up as a list of strings rather than just one string should fix your issue. 更改
text
以使其最终成为字符串列表而不仅仅是一个字符串应该可以解决您的问题。
Also, minor terminology thing. 还有,术语很少的东西。 You said
mydict = {rows[0] for rows in reader}
is a dict
. 你说
mydict = {rows[0] for rows in reader}
是一个dict
。 It's not - it's a set
. 它不是 - 它是
set
。 dict
s are specifically for key-value pairs, whereas set
s are just keys. dict
s专门用于键值对,而set
s只是键。 They're both implemented as hashtables. 它们都被实现为哈希表。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.