简体   繁体   中英

Removing words from the list of sentences

I have a list of channel names and I want to remove words from these names. I tried methods in this ( Removing words from list in python ) discussion, but did not work for me. I have these:

'Housekeeping.XTX_heater-0_Switch_Status'
 'Housekeeping.PDM_1__SW11_Status'
 'Housekeeping.Slim6_Imager-1_Switch_Status'
 'Power.BCM1_Battery_Cell_Temperature_degC'
 'Power.BCM2_Battery_Cell_Temperature_degC'
 'Power.BCR1__Battery_Discharge_Current_A'
 'Power.BCR0__Array_Temperature_degC'
 'Power.BCM0_Battery_Interface_Plate_Temp_degC'
 'Power.PDM_2__PDM_Current_A' 'Power.PDM_1__PDM_Temperature_degC'
 'Power.PDM_1__PDM_Current_A' 'Power.PDM_0__PDM_Temperature_degC'
 'Power.PDM_0__PDM_Current_A' 'Power.BCR2__BCR_Temperature_degC'
 'Power.BCR2__Battery_Discharge_Current_A'
 'Power.BCR2__Battery_Charge_Current_mA' 'Power.BCR2__Array_Voltage_V'
 'Power.BCR2__Array_Temperature_degC' 'Power.BCR2__Array_Current_mA'
 'Power.BCR1__BCR_Temperature_degC'
 'Power.BCR1__Battery_Charge_Current_mA' 'Power.BCR1__Array_Voltage_V'
 'Power.BCR1__Array_Temperature_degC' 'Power.BCR1__Array_Current_mA'
 'Power.BCR0__Overvoltage_Clamp_Current_A'
 'Power.BCR0__BCR_Temperature_degC' 'Power.BCR0__Battery_Voltage_V'
 'Power.BCR0__Battery_Charge_Current_mA' 'Power.BCR0__Array_Voltage_V'
 'Power.BCR0__Array_Current_mA' 'Thermal.WHL1_Measured_Current_mA'
 'Thermal.WHL0_Measured_Current_mA' 'Thermal.WHL1_IF_Temp_degC'
 'Thermal.WHL2_IF_Temp_degC'
 'Thermal.Prop_controller_-Y_panel__temperature_degC'
 'Thermal.WHL3_IF_Temp_degC' 'Thermal.WHL0_IF_Temp_degC'
 'Thermal.WHL3_Measured_Current_mA' 'Thermal.WHL2_Measured_Current_mA'
 'Thermal.SS1_Temperature_degC'
 'Thermal.Imager_flat_plate_EFF__temperature_degC'
 'Thermal.OBC_Temp_PPC750FL_degC' 'Thermal.OBC_Temp_PCB_degC'
 'Thermal.MTM-0_Temperature_degC' 'Thermal.AIM_Module_Temperature_degC'
 'Thermal.Sep_system_panel_-Z_+X__temperature_degC'
 'Thermal.OBDH_cardframe_-X_panel__temperature_degC'
 'Thermal.SS0_Temperature_degC' 'LIN.LIN_Failed_Nodes_Count'
 'LIN.LIN_BCM_Fail' 'LIN.LIN_Bus_Fail' 'LIN.LIN_Passive'
 'LIN.LIN_Master_1_State_Of_Health' 'LIN.LIN_Master_Up_Time'
 'LIN.LR_PA_Temperature_degC' 'LIN.My_IP_Packets' 'LIN.Switch_Error'
 'LIN.PA_Current_mA' 'LIN.S-Band_Power_Amplifier_ONOFF_State'
 'LIN.STRx0_Uplink_Reset_Count' 'LIN.STRx1_Uplink_Reset_Count'
 'LIN.Switch_Transaction_Fail_Count' 'LIN.Switch_Transaction_OK_Count'
 'LIN.TTC_0_Current_mA' 'LIN.TTC_1_Current_mA' 'LIN.TTC_Reset_Cause'
 'LIN.RSSI_dBm' 'LIN.TTC0_Temperature_degC' 'LIN.LIN_SPARE_STATUS'
 'LIN.LIN_Master_Reset' 'LIN.COUNT_FPGA_RX_STRx0' 'LIN.Lifetime_Cold_Boot'
 'LIN.Lifetime_Warm_Boot' 'LIN.LIN_Comms_Error_Count'
 'LIN.LIN_Node_Resets_Count' 'LIN.LIN_Bus_Reset'
 'LIN.LIN_Failed_Switches_Count' 'LIN.LIN_Master_0_State_Of_Health'
 'LIN.TTC1_Temperature_degC' 'LIN.UDP_Error_STRx0'
 'LIN.UDP_IPS_size_errors_STRx0' 'LIN.UDP_IPS_STRx0' 'LIN.UDP_Total_STRx0'
 'LIN.UDP_Valid_STRx0' 'LIN.UPD_IPS_errors_STRx0' 'LIN.Warm_Resets'
 'LIN.Cold_Resets' 'LIN.CAN_Reset_Count']

and want to remove these parts of sentences:

['Housekeeping.(including period)', 'Power.', 'Thermal.', 'LIN.']

expected output is:

'XTX_heater-0_Switch_Status'
 'PDM_1__SW11_Status'
 'Slim6_Imager-1_Switch_Status'
 'BCM1_Battery_Cell_Temperature_degC'
 'BCM2_Battery_Cell_Temperature_degC'
 'BCR1__Battery_Discharge_Current_A'

and so on.

lets say something like this:

import re
abc=['Housekeeping.XTX_heater-0_Switch_Status',
 'Housekeeping.PDM_1__SW11_Status',
 'Housekeeping.Slim6_Imager-1_Switch_Status',
 'Power.BCM1_Battery_Cell_Temperature_degC']
stop=['Housekeeping.', 'Power.', 'Thermal.', 'LIN.\s+']
print([(lambda x: re.sub(r'|'.join(stop), '', x))(x) for x in abc])

this is from the link you provided, I tested it and it works. give it a try

It can be solved without regex, too:

new_list= [ w.partition('.')[2] for w in old_list ]

Something like the following might work:

import copy
def remove_bad_words(in_stryngs, bad_words):
    bad_words = iter(bad_words)
    try:
        bad_word = next(bad_words)
    except StopIteration:
        return in_stryngs
    in_stryngs = iter(in_stryngs)
    out_strings = list()
    for stryng in in_stryngs:
        split_string = stryng.split(bad_word)
        blah = remove_bad_words(split_string, copy.copy(bad_words))
        out_strings.append("".join(blah))
    return out_strings

Here it is in-use:

bad_words = ["hello", "world"]

channel_names = [
    "Nationahellol Broadcahellosting Company (NBC)",
    "worldCworldBworldS (formerly world known asworld the Columbia world Broadcasting System)",
    "the Americaworldworldn Broadcashelloting Company (ABC)",
    "the Fox Broadchelloasting Coworldworldmpany (Fox)",
    "the ChelloW Televiworldsion Network.",
    "public broadcworldasting serhellovice (PBS)"
]

clean_chanel_names = remove_bad_words(channel_names, bad_words)

print("\n".join(clean_chanel_names))

The output is:

National Broadcasting Company (NBC)
CBS (formerly  known as the Columbia  Broadcasting System)
the American Broadcasting Company (ABC)
the Fox Broadcasting Company (Fox)
the CW Television Network.
public broadcasting service (PBS)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM