简体   繁体   中英

Remove trailing and leading char using csv.reader

How can I remove a certain char if my value in second column of csv starts with "(" or end with ")", I'm very new to python guys help me to solve this

Example:

0023632fa4a860be8bc85ddf39fc19c3c4c2e6fe,(Java Archive (JAR) 4049-0),Not Supported,
005c41fc0f8580f51644493fcbaa0d2d468312c3,(WIN32 EXE 7-2),Ransom.Win32.TRX.XXPE50FFF027,

to

0023632fa4a860be8bc85ddf39fc19c3c4c2e6fe,Java Archive (JAR) 4049-0,Not Supported,
005c41fc0f8580f51644493fcbaa0d2d468312c3,WIN32 EXE 7-2,Ransom.Win32.TRX.XXPE50FFF027,

I have this code using DATA INFILE

TRIM(TRAILING ')' FROM TRIM(LEADING '('

How can I apply it here in my code:

with open(fullPath, 'rb') as file:
     csv_data = csv.reader(file)
     next(csv_data)

A solution using lstrip() and rstrip()

import csv

new_rows = []
with open('test.csv', 'rt') as file:
    csv_data = csv.reader(file, delimiter=',')
    for row in csv_data:
        new_rows.append([row[0],row[1].lstrip('(').rstrip(')'),row[2]])

print(new_rows) # Outputs ['0023632fa4a860be8bc85ddf39fc19c3c4c2e6fe,Java Archive (JAR) 4049-0Not Supported', '005c41fc0f8580f51644493fcbaa0d2d468312c3,WIN32 EXE 7-2ansom.Win32.TRX.XXPE50FFF027']

Edit

To save the edit on a new .csv file just add:

with open('test2.csv', 'wt') as file:
    writer = csv.writer(file)
    for row in new_rows:
        writer.writerow(row)

Here's one way of doing it, I've replaced the first occurrence and the last occurrence of '(' and ')' from the string. Hope it helps.

s = '''0023632fa4a860be8bc85ddf39fc19c3c4c2e6fe,(Java Archive (JAR) 4049-0),Not Supported,
005c41fc0f8580f51644493fcbaa0d2d468312c3,(WIN32 EXE 7-2),Ransom.Win32.TRX.XXPE50FFF027,'''

def last_replace(s, old, new, occurrence):
    '''Replaces the last occurence of the character'''
    li = s.rsplit(old, occurrence)
    return new.join(li)

new_string = [last_replace(line, ')', '', 1).replace('(', '', 1) for line in s.split('\n')]
print(new_string)

Output:

['0023632fa4a860be8bc85ddf39fc19c3c4c2e6fe,Java Archive (JAR) 4049-0,Not Supported,',
'005c41fc0f8580f51644493fcbaa0d2d468312c3,WIN32 EXE 7-2,Ransom.Win32.TRX.XXPE50FFF027,']

PS : I stole the last_replace function from here

This is a great opportunity to learn about regular expressions ! Regular expressions are a method for recognising and dealing with patterns in text. Python has a regular expressions package as part of its standard library. I'm going to assume you're using Python 3 for the rest of this answer, where the package is named re .

The TLDR answer to your question is:

import re

string_without_parens = re.sub(r'(^\()|(\)$)', '', string_maybe_has_parens)

What's going on here, though? the re.sub() function takes three parameters, a regular expression string (denoted by the leading r ), a string that you want to replace each match with, and the string you want to substitute in. The regular expression here is (^\\()|(\\)$) . So what does that mean? Lets take it step by step:

  • A set of parentheses () represents a capture group, these can be used to get the matches out, but I've used them as a way to group characters we're looking for together. There are two capture groups in this regular expression: (^\\() and (\\)$) .
  • Between these is a | character, this represents OR in regular expression language, so it's looking for something that matches either (^\\() or (\\)$) .
  • The first capture group (^\\() : has two things inside it (well, three really, but we'll get to that). The first is ^ , this is what is called an anchor , this one in particular says, "only look at the start of the string". The second (and third) characters are \\( which says "I want to look for an opening parentheses". Because parentheses are using in regular expressions, we have to use the backslash character to "escape" it.
  • The second capture group (\\)$) : contains an escaped closing parenthesis \\) and other anchor. This anchor represents the end of the string, in the same way ^ represented the start.
  • Together this says: "match an opening parentheses at the beginning or a closing parenthesis at the end", and the re.sub() function says replace anything that matches this pattern with '' (ie nothing).

Hope that helps! If you want to play more with regular expressions, you can try out regexr , which helped me wrap my head around them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM