I have to CSV files. Data structures are equal and looks like ip, cve. I need to remove all rows, which are present in both files and leave only unique rows. (Left anti join) I think, that this can be done with left join, but it doesn't work. Is there easier way to solve such problem?
import pandas as pd
patrol = pd.read_csv('parse_results_MaxPatrol.csv')
nessus = pd.read_csv('parse_result_nessus_new.csv')
nessus_filtered = nessus.merge(patrol, how='left', left_on=[0], right_on=[0])
This code throws such traceback:
File "C:/Users/username/Desktop/pandas/parser.py", line 6, in <module>
nessus_filtered = nessus.merge(patrol, how='left', left_on=[0], right_on=[0])
File "C:\Python37\lib\site-packages\pandas\core\frame.py", line 6868, in merge
copy=copy, indicator=indicator, validate=validate)
File "C:\Python37\lib\site-packages\pandas\core\reshape\merge.py", line 47, in merge
validate=validate)
File "C:\Python37\lib\site-packages\pandas\core\reshape\merge.py", line 529, in __init__
self.join_names) = self._get_merge_keys()
File "C:\Python37\lib\site-packages\pandas\core\reshape\merge.py", line 833, in _get_merge_keys
right._get_label_or_level_values(rk))
File "C:\Python37\lib\site-packages\pandas\core\generic.py", line 1706, in _get_label_or_level_values
raise KeyError(key)
You can learn it from below given sample code
import pandas as pd
data_a = pd.read_csv('./a.csv')
data_b = pd.read_csv('./b.csv')
print('Data A')
print(data_a)
print('\nData B')
print(data_b)
data_c = pd.concat([data_a, data_b]).drop_duplicates(keep='first')
print('\nData C - Final dataset')
print(data_c)
It read two sample .csv files (a.csv and b.csv) which both having same structure (id, name columns) with few duplicate values. We just read these .csv files and drop the duplicates and keep the first row.
Data A
id name
0 1 Jhon
1 2 Kane
2 3 Leo
3 4 Brack
Data B
id name
0 2 Kane
1 4 Brack
2 5 Peter
3 6 Tom
Data C - Final dataset
id name
0 1 Jhon
1 2 Kane
2 3 Leo
3 4 Brack
2 5 Peter
3 6 Tom
Hope, this help you to solve your problem.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.