编写一个 function 获取一行并返回一个二维元组列表：歌曲名称和分数数据库

Question

I need to preprocess some data so that I can start analyzing it.我需要预处理一些数据，以便我可以开始分析它。 I currently have a data frame which contains data of Eurovision winners.我目前有一个数据框，其中包含欧洲电视网冠军的数据。 I need to create a new data frame which contains the words from each of the songs, with the points of each song assigned to each word in a tuple.我需要创建一个新的数据框，其中包含每首歌曲中的单词，并将每首歌曲的点分配给元组中的每个单词。 For example, if the song name is 'Hello World' and the score is 31, I need to create two tuples (Hello, 31) and (World, 31) and add them to a list from which I can create a new data frame.例如，如果歌曲名称是'Hello World'并且分数是 31，我需要创建两个元组(Hello, 31)和(World, 31)并将它们添加到一个列表中，我可以从中创建一个新的数据框.

Sample input样本输入

Here is the first row of my dataframe .这是我的 dataframe 的第一行。

Sample Output样品 Output

The output I want from the first row is我想要的第一行的 output 是

[('Net', 31),('als', 31),('toen', 31)]

Attempt试图

def TupleGenerator(row):
    list =[]
    for item in ev['Song']: 
        tuple = (item, ev["Points"])
        list.append(tuple)
    return list
 

TupleGenerator(ev.iloc[0])

This is what I have tried so far, but I am not sure how to get the score from the same row to be assigned to the word in the tuple.到目前为止，这是我尝试过的方法，但我不确定如何从同一行中获取分数以分配给元组中的单词。

Any advice is appreciated, thank you.任何建议表示赞赏，谢谢。

Answer 1

You have the right idea, only right now you are iterating over every character in the string row["Song"] .你有正确的想法，只是现在你正在迭代字符串row["Song"]中的每个字符。 You need to split this string up into a sequence of substrings where each substring represents a word from the song.您需要将此字符串拆分为一系列子字符串，其中每个 substring 代表歌曲中的一个词。 Then iterate over this sequence.然后迭代这个序列。 This code shows how one might do that此代码显示了如何做到这一点

def TupleGenerator(row):
    result = []
    for word in row["Song"].strip('"').split():
        result.append((word, row["Points"]))
    return result

The strip method of strings accepts one optional argument that is a string that specifies the set of characters to be removed. strings 的strip方法接受一个可选参数，该参数是一个字符串，指定要删除的字符集。 In our case, we need to remove " . The split method without any arguments returns a list of the words in the string, using consecutive whitespace string subsequences as the delimiter.在我们的例子中，我们需要删除" 。没有任何 arguments 的split方法返回字符串中的单词列表，使用连续的空白字符串子序列作为分隔符。

For example, if your df is例如，如果您的df是

df = pd.DataFrame(
    {"Year": 1957,
     "Date": "3-Mar",
     "Host City": ["Frankfurt", "Linux"],
     "Winner": ["Netherlands", "Unix"],
     "Song": ['"Net als toen"', '"git hub"'],
     "Performer": ["Corry Brokken", "Stack Overflow"],
     "Points": [31, 32],
     "Margin": [14, 15],
     "Runner-up": ["France", "cyberspace"]
    }
)

Running跑步

for index, row in df.iterrows():
    print(TupleGenerator(row))

gives output给出 output

[('Net', 31), ('als', 31), ('toen', 31)]
[('git', 32), ('hub', 32)]

I hope this helps.我希望这有帮助。 Let me know if there are any questions!如果有任何问题，请告诉我！

编写一个 function 获取一行并返回一个二维元组列表：歌曲名称和分数数据库

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-04-09 17:00:47

编写一个 function 获取一行并返回一个二维元组列表：歌曲名称和分数数据库

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-04-09 17:00:47

解决方案1
0 已采纳 2022-04-09 17:00:47