+-
我有一个字典mydict
,其中包含一些文件名作为键,并在其中包含文本作为值。
我正在从每个文件的文本中提取单词列表。单词存储在列表mywords
中。
我已经尝试过以下方法。
mydict = {'File1': 'some text. \n Foo extract this. \n Bar extract this',
'File2': 'more text. \n Bar extract this too.'}
mywords = ['Foo', 'Bar']
mylist= []
for k,v in mydict.items():
for word in mywords:
extracted = (re.findall('^ ' + word + ".*", v, flags=re.IGNORECASE|re.MULTILINE))
mylist.append(extracted[:1])
这给了我
[[' Foo extract this. '],
[' Bar extract this'],
[],
[' Bar extract this too.']]
但是,我希望输出在每次搜索文件中的单词时都具有2个嵌套列表(每个文件),而不是一个单独的列表。
所需的输出:
[[' Foo extract this. '], [' Bar extract this']],
[[], [' Bar extract this too.']]
0
投票
投票
您可能想创建子列表并将其附加到列表中。这是一个可能的解决方案:
mydict = {'File1': 'some text. \n Foo extract this. \n Bar extract this',
'File2': 'more text. \n Bar extract this too.'}
mywords = ['Foo', 'Bar']
mylist= []
for k,v in mydict.items():
sublist = []
for word in mywords:
extracted = (re.findall('^ ' + word + ".*", v, flags=re.IGNORECASE|re.MULTILINE))
sublist.append(extracted[:1])
mylist.append(sublist)
如果您希望字符串不包含周围的列表,请仅在有结果的情况下插入第一个结果:
import re
mydict = {'File1': 'some text. \n Foo extract this. \n Bar extract this',
'File2': 'more text. \n Bar extract this too.'}
mywords = ['Foo', 'Bar']
mylist= []
for k,v in mydict.items():
sublist = []
for word in mywords:
extracted = (re.findall('^ ' + word + ".*", v, flags=re.IGNORECASE|re.MULTILINE))
if extracted: # Checks if there is at least one element in the list
sublist.append(extracted[0])
mylist.append(sublist)
如果您希望能够从每个文件中获得多个结果,则可以执行以下操作(请注意,我在第二个文件中放置了Foo
的另一个匹配项:
import re
mydict = {'File1': 'some text. \n Foo extract this. \n Bar extract this',
'File2': 'more text. \n Bar extract this too. \n Bar extract this one as well'}
mywords = ['Foo', 'Bar']
mylist= []
for k,v in mydict.items():
sublist = []
for word in mywords:
extracted = (re.findall('^ ' + word + ".*", v, flags=re.IGNORECASE|re.MULTILINE))
if extracted:
sublist += extracted
mylist.append(sublist)