繁体   English   中英

使用csv文件过滤带有关键字的数据

[英]Filtering data with keywords with a csv file

我正在尝试从csv文件中过滤出数据,并且试图以一种看起来像这样的方式组织数据

0 AIG,10,,,,Yes,,,Jr,,,MS,,
1 Baylor College of Medicine,19,Yes,Yes,,,,,,,,,,Recent
2 CGG,17,Yes,Yes,,,,,,,,MS,PhD,Recent
3 Citi,27/28,Yes,,,Yes,,,Jr,Sr,,,,
4 ExxonMobil,11,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,,,
5 Flow-Cal Inc.,16,Yes,,,Yes,,,Jr,Sr,,,,All
6 Global Shop Solutions,18,Yes,,,Yes,,,,Sr,PB,,,All
7 Harris County CTS,22,Yes,,,Yes,,,Jr,Sr,PB,MS,PhD,All
8 HCSS,29,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
9 Hitachi Consulting,13,Yes,,,,,,,Sr,,MS,,
10 HP Inc.,1,Yes,,,Yes,,,Jr,,,MS,,Recent
11 INT Inc.,20,Yes,Yes,,Yes,,,Jr,Sr,,MS,PhD,
12 JPMorgan Chase & Co,3,Yes,,,Yes,,,Jr,Sr,,,,
13 Leidos,390,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,
14 McKesson,26,Yes,,,,,,,Sr,,,,
15 MRE Consulting Ltd.,2,Yes,,,,,,,Sr,PB,MS,,All
16 NetIQ,7,,,,Yes,,Soph,Jr,Sr,PB,,,
17 PROS,21,Yes,,,,,,,Sr,,MS,PhD,All
18 San Jacinto College ,14,,,,Yes,,Soph,Jr,Sr,PB,MS,,
19 SAS,4,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
20 Smartbridge,8,Yes,,,,,,,Sr,PB,MS,,
21 Sogeti USA,15,Yes,,,,,,,Sr,PB,MS,,
22 Southwest Research Institute,12,Yes,,,Yes,,,Jr,Sr,PB,MS,PhD,All
23 The Reynolds and Reynolds Company,23,Yes,Yes,,Yes,Fr,Soph,Jr,Sr,PB,,,All
24 UH Enterprise Systems,9,Yes,Yes,Yes,Yes,Fr,Soph,Jr,Sr,PB,MS,PhD,All
25 U.S. Marine Corps,25,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,All
26 ValuD Consuting LLC,5,Yes,,,,,,,Sr,PB,,,All
27 Wipro,24,Yes,,,,,,,Sr,PB,,,

但是,我的代码现在正在给我这个

0 AIG,10,,,,Yes,,,Jr,,,MS,,
1 Baylor�College�of�Medicine,19,Yes,Yes,,,,,,,,,,Recent
2 CGG,17,Yes,Yes,,,,,,,,MS,PhD,Recent
3 Citi,27/28,Yes,,,Yes,,,Jr,Sr,,,,
4 ExxonMobil,11,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,,,
5 HCSS,29,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
6 Leidos,390,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,
7 McKesson,26,Yes,,,,,,,Sr,,,,
8 NetIQ,7,,,,Yes,,Soph,Jr,Sr,PB,,,
9 PROS,21,Yes,,,,,,,Sr,,MS,PhD,All
10 SAS,4,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
11 Smartbridge,8,Yes,,,,,,,Sr,PB,MS,,
12 Wipro,24,Yes,,,,,,,Sr,PB,,,
13 SAS,4,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
14 NetIQ,7,,,,Yes,,Soph,Jr,Sr,PB,,,
15 Smartbridge,8,Yes,,,,,,,Sr,PB,MS,,
16 AIG,10,,,,Yes,,,Jr,,,MS,,
17 ExxonMobil,11,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,,,
18 CGG,17,Yes,Yes,,,,,,,,MS,PhD,Recent
19 Baylor�College�of�Medicine,19,Yes,Yes,,,,,,,,,,Recent
20 PROS,21,Yes,,,,,,,Sr,,MS,PhD,All
21 Wipro,24,Yes,,,,,,,Sr,PB,,,
22 McKesson,26,Yes,,,,,,,Sr,,,,
23 Citi,27/28,Yes,,,Yes,,,Jr,Sr,,,,
24 HCSS,29,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
25 Leidos,30,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,

如您所见,它似乎重复了我使用的某些关键字,这些关键字将在下面发布我的代码。

#I made a dictonary of the problem stated
company_dict = {0:"Company", 1:"Booth",
                2:"Full-Time", 3:"Full-Time Visa Sponsor",
                4:"Part-Time", 5:"Internship",
                6:"Freshman", 7:"Sophomore",
                8:"Junior", 9:"Senior",
                10:"Post-Bacs", 11:"MS",
                12:"PhD", 13:"Alumni"}

#Loop to organize the company_dict
for lines in company_dict:
    print(repr(lines),company_dict[lines])

keywords = ("AIG","Baylor","CGG","Citi","ExxonMobil","Flow-Cal Inc.",
           "Global SHop Solutions","Harris Count CTS","HCSS",
           "Hitachi Consulting", "HP Inc.","INT Inc.","JPMorgan Chase & Co",
           "Leidos","McKesson","MRE Consulting Ltd.","NetIQ","PROS",
           "San Jacinto College","SAS","Smartbridge","Sogeti USA",
           "Southwest Research Institute","The Reynolds and Reynolds Company",
           "UH Enterprise Systems","U.S. Marine Corps","ValuD Consuting LLC","Wipro")

with f as filterf:
    output_line_counter = 0
    for line in filterf:
        if any(keyword in line for keyword in keywords):
            print(output_line_counter, line.strip())
            output_line_counter += 1

这些全部来自分配中包含的csv文件。 我认为自己走在正确的道路上,但是我不明白为什么我的代码会重复我的代码,并且也错过了我要求它搜索的“关键字”。

我将在下面包含csv数据

ALPHABETICAL ORDER,,,,,,,,,,,,,
,,Positions,,,,Classifications,,,,,,,
Company,Booth,Full-Time,"Full-Time Visa Sponsor",Part-Time,Internship,Freshman,Sophomore,Junior,Senior,Post-Bacs,MS,PhD,Alumni
AIG,10,,,,Yes,,,Jr,,,MS,,
Baylor�College�of�Medicine,19,Yes,Yes,,,,,,,,,,Recent
CGG,17,Yes,Yes,,,,,,,,MS,PhD,Recent
Citi,27/28,Yes,,,Yes,,,Jr,Sr,,,,
ExxonMobil,11,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,,,
,...
Flow-Cal�Inc.,16,Yes,,,Yes,,,Jr,Sr,,,,All
Global�Shop�Solutions,18,Yes,,,Yes,,,,Sr,PB,,,All
Harris�County�CTS,22,Yes,,,Yes,,,Jr,Sr,PB,MS,PhD,All
HCSS,29,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
Hitachi�Consulting,13,Yes,,,,,,,Sr,,MS,,
HP�Inc.,1,Yes,,,Yes,,,Jr,,,MS,,Recent
INT�Inc.,20,Yes,Yes,,Yes,,,Jr,Sr,,MS,PhD,
JPMorgan�Chase�&�Co,3,Yes,,,Yes,,,Jr,Sr,,,,
Leidos,390,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,
McKesson,26,Yes,,,,,,,Sr,,,,
,,,,,,,,,,,,,
MRE�Consulting�Ltd.,2,Yes,,,,,,,Sr,PB,MS,,All
NetIQ,7,,,,Yes,,Soph,Jr,Sr,PB,,,
PROS,21,Yes,,,,,,,Sr,,MS,PhD,All
San�Jacinto�College��,14,,,,Yes,,Soph,Jr,Sr,PB,MS,,
SAS,4,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
Smartbridge,8,Yes,,,,,,,Sr,PB,MS,,
Sogeti�USA,15,Yes,,,,,,,Sr,PB,MS,,
Southwest�Research�Institute,12,Yes,,,Yes,,,Jr,Sr,PB,MS,PhD,All
The�Reynolds�and�Reynolds�Company,23,Yes,Yes,,Yes,Fr,Soph,Jr,Sr,PB,,,All
UH�Enterprise�Systems,9,Yes,Yes,Yes,Yes,Fr,Soph,Jr,Sr,PB,MS,PhD,All
U.S.�Marine�Corps,25,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,All
ValuD�Consuting�LLC,5,Yes,,,,,,,Sr,PB,,,All
Wipro,24,Yes,,,,,,,Sr,PB,,,
BOOTH ORDER,,,,,,,,,,,,,
,Booth,Positions,,,,Classifications,,,,,,,
Company,#,Full-Time,"Full-Time
Visa Sponsor",Part-Time,Internship,Freshman,Sophomore,Junior,Senior,Post-Bacs,MS,PhD,Alumni
HP�Inc.,1,Yes,,,Yes,,,Jr,,,MS,,Recent
"MRE�Consulting,�Ltd.",2,Yes,,,,,,,Sr,PB,MS,,All
JPMorgan�Chase�&�Co,3,Yes,,,Yes,,,Jr,Sr,,,,
SAS,4,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
ValuD�Consuting�LLC,5,Yes,,,,,,,Sr,PB,,,All
NetIQ,7,,,,Yes,,Soph,Jr,Sr,PB,,,
Smartbridge,8,Yes,,,,,,,Sr,PB,MS,,
UH�Enterprise�Systems,9,Yes,Yes,Yes,Yes,Fr,Soph,Jr,Sr,PB,MS,PhD,All
AIG,10,,,,Yes,,,Jr,,,MS,,
ExxonMobil,11,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,,,
Southwest�Research�Institute,12,Yes,,,Yes,,,Jr,Sr,PB,MS,PhD,All
Hitachi�Consulting,13,Yes,,,,,,,Sr,,MS,,
San�Jacinto�College��,14,,,,Yes,,Soph,Jr,Sr,PB,MS,,
Sogeti�USA,15,Yes,,,,,,,Sr,PB,MS,,
"Flow-Cal,�Inc.",16,Yes,,,Yes,,,Jr,Sr,,,,All
CGG,17,Yes,Yes,,,,,,,,MS,PhD,Recent
Global�Shop�Solutions,18,Yes,,,Yes,,,,Sr,PB,,,All
Baylor�College�of�Medicine,19,Yes,Yes,,,,,,,,,,Recent
"INT,�Inc.",20,Yes,Yes,,Yes,,,Jr,Sr,,MS,PhD,
PROS,21,Yes,,,,,,,Sr,,MS,PhD,All
Harris�County�CTS,22,Yes,,,Yes,,,Jr,Sr,PB,MS,PhD,All
The�Reynolds�and�Reynolds�Company,23,Yes,Yes,,Yes,Fr,Soph,Jr,Sr,PB,,,All
Wipro,24,Yes,,,,,,,Sr,PB,,,
U.S.�Marine�Corps,25,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,All
McKesson,26,Yes,,,,,,,Sr,,,,
Citi,27/28,Yes,,,Yes,,,Jr,Sr,,,,
HCSS,29,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
Leidos,30,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,

我认为它与csv文件框中的问号有关系,但我不确定。 我想从我给它的关键字中搜索csv文件,然后打印该行。 非常感谢您的任何意见或建议:)

答案是只需要更改cvs文件(希望它具有奇怪的UTF错误的项目可以)

我还添加了以下代码

DataList = []
with f as filterf:
    output_line_counter = 0
    for line in filterf:
        if any(keyword in line for keyword in keywords):
            output_line_counter += 1
            DataList.append(line)

CleanerData = sorted(set(DataList))
line_counter = 0
for i in CleanerData:
    line_counter += 1
    print(line_counter, i, end='')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM