I want to delete some rows from a CSV file by saving a new CSV after a validation process. I wrote the code below but it causes an error.
with open(path_to_read_csv_file, "r") as csv_file:
csv_reader = csv.DictReader(csv_file, delimiter=',')
for line in csv_reader:
# if validation(line[specific_column]):
try:
df = pd.DataFrame(line)
df.to_csv(path_to_save_csv_file)
except Exception as e:
print('Something Happend!')
print(e)
continue
Error:
Something Happend!
If using all scalar values, you must pass an index
I've also tried to add an index value by df = pd.DataFrame(line, index=[0])
, but it only stores the first line with an additional empty column at the beginning. How can solve this?
Another version with line
works but I can not reach a specific key value at each line:
inFile = open(path_to_read_csv_file, 'r')
outFile = open(path_to_save_csv_file, 'w')
for line in inFile:
try:
print('Analysing:', line)
# HERE, how can I get the specific column value? I used to use line[specific_column] in the last version
if validation(line[specific_column]):
outFile.write(line)
else:
continue
except Exception as e:
print('Something Happend!')
print(e)
continue
outFile.close()
inFile.close()
This should help you. Basically, you cannot create a DataFrame from scalar-values only. They have to be wrapped in eg. a list
.
The constructor pd.DataFrame
expects you to tell how the data that you have provided has to be indexed as well. This is documented here .
The function csv.DictReader
uses
the values in the first row of file f will be used as the fieldnames.
For more information, refer to the csv documentation .
Hence, each line
that is parsed by the csv_reader
is a dictionary where the keys are the CSV header and the values are each the row in the particular line.
So for example, if my CSV is:
Header1, Header2, Header3
1,2,3
11,11,33
Then in the first iteration, the line
object would be:
{'Header1': '1', 'Header2': '2', 'Header3': '3'}
Now when you supply this to pd.DataFrame
, you need to specify what the data is and what the headers/indices are. In this case, the data is ['1', '2', '3']
and the headers/indices are ['Header1', 'Header2', 'Header3']
. These can be extracted by the calls line.values()
and line.keys()
respectively.
This is the change I have made.
with open(path_to_read_csv_file, "r") as csv_file:
csv_reader = csv.DictReader(csv_file, delimiter=',')
for line in csv_reader:
try:
# validation ...
df = pd.DataFrame(line.values(), line.keys())
df.to_csv(path_to_save_csv_file)
except Exception as e:
print('Something Happend!')
print(e)
continue
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.