简体   繁体   中英

How to remove both positive and negative numbers from a list in dataframe?

I have a dataframe which contains two list, how to drop both positive and negative number from the lists? Here is my data: https://github.com/mayuripandey/Data-Analysis/blob/main/similarity.csv

Input: [0, hi. hello, okay, -3]

Expected output: [hi, hello, okay]

It looks like you are trying to remove the integers from the start of the Model1_list and Model2_list columns? If this is the case, you might find Python's ast.literal_eval() function useful. This can take the strings for those columns and convert them into Python lists. It is then a simple case of choosing which parts of the list you want, for example skip the first entry using [1:] .

For example:

from ast import literal_eval 
import csv

with open('similarity.csv') as f_input:
    csv_input = csv.DictReader(f_input)
    
    for row in csv_input:
        model1_list = literal_eval(row['Model1_list'])[1:]
        model2_list = literal_eval(row['Model2_list'])[1:]
        
        print(f"{row['Name_1']:40} {str(model1_list):50} {model2_list}")

This converts the two model list columns into Python lists and displays them:

-1_gun_dont_protect_like                 ['gun', 'dont', 'protect', 'like']                 ['gun', 'peopl', 'right', 'get']
0_http_tco_freenrent_nhttp               ['http', 'tco', 'freenrent', 'nhttp']              ['school', 'children', 'teacher', 'kid']
1_kavanaugh_brett_kill_near              ['kavanaugh', 'brett', 'kill', 'near']             ['http', 'tco', 'freenrent', 'statehoodpr']
2_democrat_strategist_republican_care    ['democrat', 'strategist', 'republican', 'care']   ['idiot', 'stupid', 'your', 'moron']
3_republican_democrat_gun_control        ['republican', 'democrat', 'gun', 'control']       ['suprem', 'court', 'justic', 'assassin']
4_liber_leftist_left_riot                ['liber', 'leftist', 'left', 'riot']               ['weapon', 'gun', 'assault', 'buy']

I would use python's built-in lstrip to remove the plus or minus at the start and check if it's numeric using the isnumeric method.

lst = ["0", "hi", "hello", "okay", "-3"]
is_num = lambda s : not s.lstrip('+-').isnumeric()
print(list(filter(is_num, lst))) # or [x for x in lst if is_num(x)]
# output : ['hi', 'hello', 'okay'] 

One way to do this is with exception handling

def is_int(x):
    try:
        int(x)
    except ValueError:
        return False
    else:
        return True

input_ = [0, "hi", "hello", "okay", -3]
out = [i for i in input_ if not is_int(i)]  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM